Yann Neuhaus Blog

All around Open Source and High Availability

Data Guard and ORA-04031 errors on standby

From time to time the Data Guard broker log file shows ORA-04031 errors. The primary site cannot connect to the standby site anymore.


DG 2008-04-09-23:31:44        0 2 0 RSM0: Failed to connect to remote database MYDB_SITE2. Error is ORA-04031

It seems that the available shared_pool is not sufficient on the standby site. Let's have a deeper look.

On the both sides (primary and standby) we have sga_target = sga_max_size = 800M :

SYS@MYDB > show parameter sga_

NAME               TYPE        VALUE
------------------ ----------- ------------
sga_max_size       big integer 800M
sga_target         big integer 800M

On the production server the current shared pool size is about 500 MB (automatically tuned by Oracle):

SYS@MYDB > @ssinipar
Wrote file temp.tmp
Enter value for parameter: shared_pool_size

Parameter                    Session      
---------------------------- -------------
__shared_pool_size           494927872
shared_pool_size             100663296
    

However, on the standby server the current shared pool site is about 240 MB :

Parameter                  Session      
-------------------------- -------------
__shared_pool_size         239075328
shared_pool_size           100663296

Furthermore, the standby database has less free shared pool than the primary database, it is also quite fragmented.

On standby:

POOL         NAME             BYTES
------------ ---------------- ----------
shared pool  free memory      18534000

On primary:

POOL         NAME             BYTES
------------ ---------------- ----------
shared pool  free memory      68049224

On the primary site a lot of resize operations have been performed by Oracle, it seems that this application has a higher parsing than I/O activity :

COMPONENT            CURRENT_SIZE OPER_COUNT LAST_OPER_TIME
-------------------- ------------ ---------- -------------------
shared pool             494927872        142 10-APR-008 00:47:23
streams pool              4194304         43 09-APR-008 22:05:46
DEFAULT buffer cache    260046848        185 10-APR-008 00:47:23

The streams pool size can be explained by data pump, which is used to perform the logical backup on the primary side.

The database requires much more shared pool that database block cache. On the standby side only 46 (23+23) SGA resize operations took place, compared to the 370 (142 + 43 + 185) and more on the primary side.

COMPONENT                 CURRENT_SIZE OPER_COUNT LAST_OPER_TIME
------------------------- ------------ ---------- -------------------
shared pool                  239075328         23 10-APR-008 15:11:31
DEFAULT buffer cache         520093696         23 10-APR-008 15:11:31

It is also very interesting to observe that the standby database handles much more database blocks than the primary database:

On the standby side:

select count(*) from v$bh;

  COUNT(*)
----------
     61748

On the primary database:

select count(*) from v$bh;

  COUNT(*)
----------
     30591

This can be explained by the fact that the standby database only performs recovery. The blocks are built from the standby redo log file, loaded into the buffer cache in order to be applied on the database which is in recovery mode.

However, the Data Guard broker and its processes also need memory in the shared pool to work. Unfortunately, because the standby database allocates much more database block cache memory than shared pool in the SGA, some ORA-4031 can occur.

Example statement on the standby site :

SELECT dest_id FROM v$archive_dest WHERE (target='LOCAL' OR target='PRIMARY') AND destination = '/export/oradata/MYDB/data99/archivelog'
AND (valid_type <> 'ONLINE_LOGFILE') AND (valid_role <> 'PRIMARY_ROLE') AND (dest_id <= 10) AND (dest_name not in ( select alternate from v
$archive_dest))

Several conclusions can be drawn from this analysis:

- After activation, the standby database will have a SGA configuration which may be in conflict with the application requirements (more cache size than shared pool)
- Oracle does not seem to transfer the SGA modifications from the primary to the standby site but sets the parameters to the requirements of each role (more cache size on the standby site)
- It is not a good idea to set a smaller SGA size on the standby side in order to save some memory space, this error will be amplified.

In some cases, flushing the shared pool could help:

SYS@MYDB > alter system flush shared_pool;

System altered.

A solution could be to set the shared_pool_size to a value corresponding to the lowest limit allowing to operate the standby database without (or with less) ORA-4031 errors. This low-limit seems to be at around 300-350 MB.

Kommentare

Keine Kommentare

Kommentar abgeben

(verpflichtend) 

(verpflichtend) 

(optional)

(verpflichtend)