Yann Neuhaus Blog

All around Open Source and High Availability
Data Guard and ORA-04031 errors on standby

From time to time the Data Guard broker log file shows ORA-04031 errors. The primary site cannot connect to the standby site anymore.


DG 2008-04-09-23:31:44        0 2 0 RSM0: Failed to connect to remote database MYDB_SITE2. Error is ORA-04031

It seems that the available shared_pool is not sufficient on the standby site. Let's have a deeper look.

On the both sides (primary and standby) we have sga_target = sga_max_size = 800M :

SYS@MYDB > show parameter sga_

NAME               TYPE        VALUE
------------------ ----------- ------------
sga_max_size       big integer 800M
sga_target         big integer 800M

On the production server the current shared pool size is about 500 MB (automatically tuned by Oracle):

SYS@MYDB > @ssinipar
Wrote file temp.tmp
Enter value for parameter: shared_pool_size

Parameter                    Session      
---------------------------- -------------
__shared_pool_size           494927872
shared_pool_size             100663296
    

However, on the standby server the current shared pool site is about 240 MB :

Parameter                  Session      
-------------------------- -------------
__shared_pool_size         239075328
shared_pool_size           100663296

Furthermore, the standby database has less free shared pool than the primary database, it is also quite fragmented.

On standby:

POOL         NAME             BYTES
------------ ---------------- ----------
shared pool  free memory      18534000

On primary:

POOL         NAME             BYTES
------------ ---------------- ----------
shared pool  free memory      68049224

On the primary site a lot of resize operations have been performed by Oracle, it seems that this application has a higher parsing than I/O activity :

COMPONENT            CURRENT_SIZE OPER_COUNT LAST_OPER_TIME
-------------------- ------------ ---------- -------------------
shared pool             494927872        142 10-APR-008 00:47:23
streams pool              4194304         43 09-APR-008 22:05:46
DEFAULT buffer cache    260046848        185 10-APR-008 00:47:23

The streams pool size can be explained by data pump, which is used to perform the logical backup on the primary side.

The database requires much more shared pool that database block cache. On the standby side only 46 (23+23) SGA resize operations took place, compared to the 370 (142 + 43 + 185) and more on the primary side.

COMPONENT                 CURRENT_SIZE OPER_COUNT LAST_OPER_TIME
------------------------- ------------ ---------- -------------------
shared pool                  239075328         23 10-APR-008 15:11:31
DEFAULT buffer cache         520093696         23 10-APR-008 15:11:31

It is also very interesting to observe that the standby database handles much more database blocks than the primary database:

On the standby side:

select count(*) from v$bh;

  COUNT(*)
----------
     61748

On the primary database:

select count(*) from v$bh;

  COUNT(*)
----------
     30591

This can be explained by the fact that the standby database only performs recovery. The blocks are built from the standby redo log file, loaded into the buffer cache in order to be applied on the database which is in recovery mode.

However, the Data Guard broker and its processes also need memory in the shared pool to work. Unfortunately, because the standby database allocates much more database block cache memory than shared pool in the SGA, some ORA-4031 can occur.

Example statement on the standby site :

SELECT dest_id FROM v$archive_dest WHERE (target='LOCAL' OR target='PRIMARY') AND destination = '/export/oradata/MYDB/data99/archivelog'
AND (valid_type <> 'ONLINE_LOGFILE') AND (valid_role <> 'PRIMARY_ROLE') AND (dest_id <= 10) AND (dest_name not in ( select alternate from v
$archive_dest))

Several conclusions can be drawn from this analysis:

- After activation, the standby database will have a SGA configuration which may be in conflict with the application requirements (more cache size than shared pool)
- Oracle does not seem to transfer the SGA modifications from the primary to the standby site but sets the parameters to the requirements of each role (more cache size on the standby site)
- It is not a good idea to set a smaller SGA size on the standby side in order to save some memory space, this error will be amplified.

In some cases, flushing the shared pool could help:

SYS@MYDB > alter system flush shared_pool;

System altered.

A solution could be to set the shared_pool_size to a value corresponding to the lowest limit allowing to operate the standby database without (or with less) ORA-4031 errors. This low-limit seems to be at around 300-350 MB.

Are Virtual IPs required for Data Guard?

Until 10.2.0.3, the Data Guard (and broker) technology suffered from several weaknesses. One of the "highlights" was that no Virtual IP management was provided by Oracle. This has quite important consequences for the clients as soon as the primary server is down (no IP responding to the TCP requests e.g. of a connection build up). In this case, the failover from the first to the second address (in the address list of the TNS entry used by the client) could take a long time, depending on network/TCP timeouts. The RAC/Clusterware architecture solved this issue by introducing the Virtual IPs (VIP) which where always available (maybe moved to a surviving node).

 

Of course several solutions are available to overcome that problem, for instance managing your own VIP with self-made scripts (and all the subsequent risks, you try to fake a failover cluster!), combine Data Guard with  Failover Cluster technology (e.g. Heartbeat under Linux, Veritas Cluster Services), and so on ...

 

All that solutions have as almost as many  drawbacks as advantages, the most important are:

 

- How can we garantee that the VIP is always started on the Primary database side?

- How is it possible to prevent two Virtual IP running in parallel in the cluster?

 

However, one of the mentioned solutions is almost "mandatory" for Oracle clients before 10.2.0.3.

 

Starting with 10.2.0.3 on client and server side, Oracle proposes the OUTBOUNT_CONNECT_TIMEOUT parameter (not very well documented though). This option must be set in the sqlnet.ora file and defines the maximum number of seconds to be waited before trying to access the second, third etc. address in the ADDRESS_LIST, independent of TCP/IP timeouts. Examples:

 

In sqlnet.ora:

OUTBOUND_CONNECT_TIMEOUT=5

 

In tnsnames.ora on the client side or in the LDAP server:

PCMDB =

  (DESCRIPTION =

   (ADDRESS_LIST=

    (ADDRESS = (PROTOCOL = TCP)(HOST = server1 )(PORT = 1521))

    (ADDRESS = (PROTOCOL = TCP)(HOST = server2 )(PORT = 1521))

   )

    (CONNECT_DATA =

      (SERVICE_NAME = PCMDB_RW )

    )

  )

 

PCMDB_RW is a service created beforehand and activated through a "startup database" trigger (see end of the post). This makes sure, that the client can only connect to the primary side. If after 5 seconds (in our case), server1 is still unreachable, the client will try to connect to server2 (ignoring existing TCP/IP parameters).

 

Unfortunately, during our tests we observed that this parameter requires the remote_listener parameter to be set on the server side, at least for Linux/Sun(/... ?) clients. This for the non-error situation that a client first tries the standby database (and therefore should failover to the primary). For windows clients, patch 6038241 (10.2.0.3.7P) can be installed (only available for 32 Bit), see metalink note 342443.1. With this patch, the remote_listener parameter does not have to be.

 

So the Windows clients have a valid/stable solution using the patch.

 

The Linux clients however suffer from the side effects of the remote_listener parameter: The remote_listener parameter is used to register to other (remote) listeners in order to inform them about the current instance load. This means that in case of crash of the primary server and activation of the standby database, the service PCMDB_RW is known by the listener of the standby side as being offered by the new primary (ok) and the old primary (not ok)!

 

This situation may result in "ORA-12560: TNS:protocol adapter error" errors.

 

The solution is quite simple: the listener should be "reloaded" on the new primary side (old standby) when the standby database becomes the primary. This could be easily automated with a script and a DB_CHANGE_ROLE trigger. See the code below :

 

Script to reload the listener:

#!/bin/ksh

# Set the ORACLE environment variables (ORACLE_HOME, PATH, aso ...)

lsnrctl reload LISTENER_PCMDB

 

Below the trigger to reload the listener:

CREATE OR REPLACE TRIGGER RELOAD_LISTENER AFTER DB_ROLE_CHANGE ON DATABASE

BEGIN

dbms_scheduler.create_job(job_name=>'reload_lsn',

job_type=>'executable',job_action=>

'/u00/app/oracle/local/custom/bin/reload_lsn_PCMDB.ksh',

enabled=>TRUE);

END;

/

 

With this solution, both Linux and Windows clients do not require any VIP or Failover Cluster anymore to fully benefit from a Data Guard architecture. The only 'restriction' is to use 10.2.0.3 Oracle client releases. Even 10.2.0.2 does not work properly. Of course, wait for 10.2.0.4 and hope that the "problem" is fixed could be another solution :-)

 

Appendix : startup trigger for "_RW" service :

create or replace trigger service_trigger after

startup on database

declare

db_name       varchar(512);

db_domain     varchar(512);

database_role varchar(16);

service_names varchar(512);

instance_name varchar2(16);

begin

  select value into service_names from v$parameter where name = 'service_names';

  select value into instance_name from v$parameter where name = 'instance_name';

  select value into db_name from v$parameter where name = 'db_name';

  select value into db_domain from v$parameter where name = 'db_domain';

  select database_role into database_role from v$database;

  if database_role = 'PRIMARY' then

    service_names := rtrim(service_names || ', ' || db_name || '_RW.' || db_domain, '.');

  else

  service_names := rtrim(service_names || ', ' || db_name || '_RO.' || db_domain, '.');

  end if;

  execute immediate 'alter system set service_names = ''' || service_names || ''' scope=memory sid=''' || instance_name || '''';

end;

/

 
Change Data Guard configuration file location “online”

This short post presents how to change online the Data Guard configuration files location (parameters dg_broker_config_fileX) where X is 1 or 2.

 

REM ================================================
REM renaming of the dg_broker_config_file parameters
REM================================================= 

set head off
set feed off
set echo off
set verify off 
set trimspool on 

ACCEPT site_number CHAR PROMPT 'Enter site number <2: ' DEFAULT '2' 

var filename_dg1 varchar2(2000)
var filename_dg2 varchar2(2000) 

spool rename_dg_broker_config_file_do.sql 

select 'alter system set dg_broker_start=false scope=memory'||chr(10)||'/'
from dual
/ 

begin
select '/u00/app/oracle/admin/'||name||'/pfile/dr1_'||name||'_SITE'||&site_number||'.dat' into :filename_dg1
from v$database;
select '/u00/app/oracle/admin/'||name||'/pfile/dr2_'||name||'_SITE'||&site_number||'.dat' into :filename_dg2
from v$database;
end;
/ 

select '!mv '||value||' '||:filename_dg1
from v$parameter
where name = 'dg_broker_config_file1'
/ 

select '!mv '||value||' '||:filename_dg2
from v$parameter
where name = 'dg_broker_config_file2'
/ 

select 'alter system set dg_broker_config_file1=''/u00/app/oracle/admin/'||name||'/pfile/dr1_'||name||'_SITE'||&site_number||
'.dat'''||
chr(10)||'/'
from v$database
/ 

select 'alter system set dg_broker_config_file2=''/u00/app/oracle/admin/'||name||'/pfile/dr2_'||name||'_SITE'||&site_number||
'.dat'''||
chr(10)||'/'
from v$database
/ 

select 'alter system set dg_broker_start=true scope=memory'||chr(10)||'/'
from dual
/ 

select chr(10) from dual; 

spool off 
set echo on
set feed on print :filename_dg1

print :filename_dg2 

spool rename_dg_broker_config_file_do.log 

@@rename_dg_broker_config_file_do.sql 

spool off

 

 

 We could consider that the script is not REALLY online because the Data Guard broker is stopped Wink, but this has no impact on the instance availability.

Datafile offline/online issue - Data Guard issue after failover/switchover

At a customer site (Oracle 10.2.0.3, AIX, CIO enabled), the following error occurred during some "datafile move" actions on the database . Note that this post only presents the solution on a real-life problem without knowing exactly if the issue was related to CIO, flashback or any Oracle or AIX “feature” :

 

- Database was previously in a Data Guard (DG) configuration

- Flashback is activated for the database

 

The performed actions were:

1. offline tablespace, test if the file can be copied (with a simple “cp” command). File was still locked (may be due to CIO).

2. online tablespace, an error occurs :

SQL> alter tablespace CONTRAINTESG2 online;

alter tablespace CONTRAINTESG2 online

*

ERROR at line 1:

ORA-01113: file 22 needs media recovery

ORA-01110: data file 22: '/export/oradata/G2DBR1/data01/contraintesg2_db_01.dbf'

 

Of course we directly tried to recover the datafile :

SQL> recover datafile 22;

ORA-00283: recovery session canceled due to errors

ORA-38727: FLASHBACK DATABASE requires a current control file.

 

No way ! Impossible to online the datafile. We also observed that the database datafiles aren't backed up since a while (we checked in the RMAN backup log file). However, the actual control file knows the datafiles (visible with "alter database backup controlfile to trace"). The controlfile is also in a correct state (CURRENT) :

SQL> select controlfile_type from v$database;

 

CONTROL

-------

CURRENT

 

It seems that during the “recover datafile”, Oracle tries to perform a flashback of the database and complains because it feels it is a BACKUP controlfile. The control file is however really a CURRENT controlfile. This problem is described in the note 438437.1. No restore point has been created :

SQL> select * from v$restore_point;

 

no rows selected

 

What we had to do to solve the problem ? :

 

1. Remove the flashback settings for the database.

SQL> shutdown immediate;

SQL> startup mount;

SQL> alter database flashback off;

SQL> alter database open;

 

SQL> r

  1* select file#,status from v$datafile

 

     FILE# STATUS

---------- -------

         1 SYSTEM

         2 ONLINE

...

        21 ONLINE

        22 RECOVER       
        23 RECOVER

        24 ONLINE

 

24 rows selected.

 

2. Try to recover the datafile, an ORA-600 appeared :

SQL> recover datafile 22;

ORA-00283: recovery session canceled due to errors

ORA-00600: internal error code, arguments: [kccfhb_1], [4], [], [], [], [], [], []

 

3. Creation of a new controlfile with NORESETLOGS (statement coming from "alter database backup controlfile to trace")):

SQL> @/export/soft/oracle/admin/G2DBR1/udump/cr_ctl.sql

 

Control file created.

 

SYS> recover database;

SYS> alter database open;

 

All files are online :

SYS> select file#,status from v$datafile;

 

     FILE# STATUS

---------- -------

         1 SYSTEM

         2 ONLINE

...

        22 ONLINE

        23 ONLINE       
       
24 ONLINE
 

24 rows selected.

 

The last step is of course to create the temporary tablespaces Wink

 

Last update 14.01.2008 :

 

After some additionnal tests with the customer we observed that the following scenery makes a Data Guard configuration in a very dangerous situation:

 

On a Data Guard configuration, perform a failover, then a switchover to get the initial situation (after reinstate, of course, from the old primary).

Afterwards, perform backups with a RMAN catalog

Add new datafiles on the database

These last added datafiles won't be backed up by RMAN anymore !

 

Only a backup without catalog will consider and save these datafiles.

 

Performing a RESYNC catalog doesn't solve the problem, the catalog doesn't know these new datafiles.

 

The only solution is to re-create the control file ! A SR has been opened by the customer.

 

Oracle Backup with Media Manager and Flashback Recovery Area (FRA)
Managing RMAN backups with FRA (Flashback Recovery Area) is definitively not "out of the box". Consider the following situation: the RMAN backup are performed in the Flashback Recovery Area in a first step (before going to tape). In a second step, the backup are saved on tape from the FRA with the following policies : DAILY, WEEKLY, MONTHLY. Each policy has its own retention (5 days, 5 weeks, 13 months).

The used media manager is NetBackup from Veritas. The backup scripts look like:

Backup of the database:
run {
allocate channel bck_chan1 type disk;
backup as compressed backupset incremental level 1 tag inc1_daily_dbf database;
backup current controlfile for standby
tag inc1_daily_std_ctrl;
sql "alter system archive log current";
backup filesperset 300 archivelog all not backed up 2 times tag inc1_daily_arch;
backup current controlfile tag inc1_daily_cur_ctrl;
backup spfile tag inc1_daily_spfile;
sql "alter database backup controlfile to trace";
backup filesperset 300 archivelog until time 'SYSDATE - 3' not backed up 2 times delete input tag inc1_daily_arch;
release channel bck_chan1;
}


Backup of the FRA:
run
{
allocate channel bck_chan1 type 'SBT_TAPE';
send  'NB_ORA_CLASS=Net0WinOracleGP, NB_ORA_SERV= NetBackupServ, NB_ORA_SCHED=DaillySchedule,NB_ORA_CLIENT=MyClient';
backup recovery area;
release channel bck_chan1;
}


If you didn't setup any RMAN catalog house keeping from the begin you could be confronted with large crosscheck/delete times the first time. The goal is to remove the RMAN catalog entries because they were already deleted from the media manager (out of the retention period).

A way to optimize the crosscheck/delete procedure, is to perform a first crosscheck with the NetBackup parameters (an access to the tape media manager will be performed) : the backup no more existing on tape will be set to “EXPIRED”. In a second step delete the “EXPIRED” backup with a new channel without NetBackup channel parameters (no tape access required, this will be faster):

For instance, to delete all the backups older then 100 days:
allocate channel for maintenance type 'SBT_TAPE';

# Set the NetBackup parameters
send 'NB_ORA_CLASS=Net0WinOracleGP, NB_ORA_SERV=NetBackupServ, NB_ORA_SCHED=DaillySchedule,NB_ORA_CLIENT=myclient';

# Perform the crosscheck
crosscheck backup completed before 'sysdate - 100';

# This second allocate avoids any netbackup access because no NetBackup
# variables are set.
allocate channel for maintenance type 'SBT_TAPE';

# Delete the entries in the catalog :
delete force noprompt expired backup completed before 'sysdate - 100';

But the problem is not yet totally solved. Indeed some entries have been deleted from the catalog which MAY (!) be still available on the FRA ! This could be due to the fact that the FRA is very large. This can be easily verified by a simple “ls” command in the FRA :
ls -lrt /export/oradata/ORACLE_SID/data99/flash_recovery_area/ORACLE_SID_SITE1/backupset/

In fact Oracle doesn't delete the FRA backupset as long as space if available in the FRA. Also the RMAN command “delete “EXPIRED”” does NOT delete the FRA backupsets.

The consequence is very simple: on the next scheduled daily backup all the backupset still available in the FRA will be backed up again on the Media Manager. This could generate a bad surprise because the backup will suddenly take much more time (depending on the size of the FRA).

It is required to suppress the backupset manually from the FRA through the following command:
delete obsolete recovery window of 100 days device type disk;

The FRA seems to clear the old backup set automatically when space is required, but when lot of space is available in the FRA the backup remain and must be cleaned manually.

It is also worth to mention that this effect also takes place even if only one retention period has been defined. The only factor limiting the number of backup stored in the FRA is the place available and not the retention period of the RMAN backup strategy.

One of the numerous hidden FRA “feature”. However the benefit of the FRA in a Dataguard environment largely covers the several drawbacks.
Dataguard configuration modification to MAXIMUM AVAILABILITY

In order to change the Dataguard configuration from MAXIMUM PERFORMANCE to MAXIMUM AVAILABILITY, Oracle doesn't require to bounce the databases anymore since 10.2.0.3.

In fact this is not always the case. The only way to verify if it will occur is:

1. change log transport mode

edit database 'MYDB_SITE1' set property LogXptMode = 'SYNC';
edit database 'MYDB_SITE2' set property LogXptMode = 'SYNC';

2. look for the following entry in the alert.log file of the PRIMARY database :

Destination LOG_ARCHIVE_DEST_2 is SYNCHRONIZED

If this string has been found, the Dataguard configuration can be changed smoothly (without database reboot) to MAXIMUM AVAILABILITY as described below :

edit configuration set protection mode as 'MaxAvailability';

Of course the standby redo log files must exist, see previous posts.

If the entry in the alert.log file has not been found a workaround could be to to set the LogXptMode property to 'ASYNC' and then back to 'SYNC'.

Many thanks to Hervé who was the first one to discover this by Trivadis :-)

Data Guard in SYNC mode – does my DB suffer from network weakness ?

My Grid Control console is often polluted by warnings about wait events of the network event class :



Is my Data Guard setup really suffering from poor performance?

I operate with the Maximum Availability mode.

We will investigate why the Grid Control warnings occur. The database is up since about 3 days (72 hours). First of all try to identify the wait classes which generated the most important wait times:

set lines 130
select * from V$SYSTEM_WAIT_CLASS
order by 5
/

WAIT_CLASS_ID WAIT_CLASS# WAIT_CLASS          TOTAL_WAITS TIME_WAITED
------------- ----------- ------------------- ----------- -----------
   4217450380           1 Application                1472         434
   3875070507           4 Concurrency                1178        1338
   3290255840           2 Configuration              1097        5250
   3386400367           5 Commit                     3512        6981
   4166625743           3 Administrative             4906      111527
   2000153315           7 Network                 2549790      325016
   1740759767           8 User I/O                1384276      427440
   4108307767           9 System I/O              3151272      646415
   1893977003           0 Other                    251373     2207566
   2723168908           6 Idle                    3720999   639262672

In the database, after a performance alert occurred in Grid Control, we can identify the problem in V$WAITCLASSMETRIC_HISTORY :

select BEGIN_TIME, END_TIME, DBTIME_IN_WAIT, TIME_WAITED WAIT_COUNT
  from V$WAITCLASSMETRIC_HISTORY
 where WAIT_CLASS# = 7; # 7 for network

BEGIN_TIME           END_TIME               DBTIME_IN_WAIT TIME_WAITED WAIT_COUNT
-------------------- --------------------   -------------- ----------- ----------
25-SEP-2007 11:00:54 25-SEP-2007 11:01:54   21.0387822     2143.1923        19262
25-SEP-2007 10:59:54 25-SEP-2007 11:00:54   51.2713653     2147.2517        19047
25-SEP-2007 10:58:54 25-SEP-2007 10:59:54   55.8887214     2480.2726        18405
25-SEP-2007 10:57:53 25-SEP-2007 10:58:54   44.2017456     2374.9075        21223
25-SEP-2007 10:56:53 25-SEP-2007 10:57:53   33.1204214     2676.6979        28336
25-SEP-2007 10:55:53 25-SEP-2007 10:56:53   37.3184263     3498.7629        30965
25-SEP-2007 10:54:54 25-SEP-2007 10:55:53   27.779576      3130.5472        29403
25-SEP-2007 10:53:54 25-SEP-2007 10:54:54   26.117611      2752.5801        23430

Now we will try to analyze in depth the events mentioned in Grid control :

"Database Time Spent Waiting (%)" is at 88.26568 for event class "Network"

Let’s have a look at the primary site. According to the white paper presented in the Metalink note 387174.1, the event “LNS wait on SENDREQ” covers the sequence of the following actions (when a commit transaction is propagated to the standby database).

(Extract of the Metalink note:)
The RFS process receives the redo being sent by LNS and completes the I/O to the standby redo log (important to understand the analysis below).

V$SYSTEM_EVENT is the view which will allow us to identify the current waits events :

select EVENT, TOTAL_WAITS, TOTAL_TIMEOUTS, TIME_WAITED, AVERAGE_WAIT
from v$system_event
where event like '%LNS%'
   or event like '%LGWR%'
/

EVENT                     TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT
------------------------- ----------- -------------- ----------- ------------
. . .
LNS wait on SENDREQ            125643              0      156673         1.25
. . .

On the standby side we perform almost the same select (a few seconds later) , to identify the LNS/RFS related wait times :


select EVENT, TOTAL_WAITS, TOTAL_TIMEOUTS, TIME_WAITED, AVERAGE_WAIT

  from v$system_event
 where event like '%RFS%'
 order by 4
/

EVENT                     TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT
------------------------- ----------- -------------- ----------- ------------
. . .
RFS write                      125270              0      121825          .97
. . .


We clearly observe that out of the 156673 ms waited because of "LNS wait on SENDREQ" (on the primary, see above for detailed description of this event), 121825 ms are due to the waits on writing to the standby redo log files on the standby side ("RFS write")

In the whole propagation/synchronization process between primary and standby the most important part is simply the writing on the standby redo log files (normal I/Os to keep the transactions on the standby side) and not the network as expected.

It is worth to mention that the "redo write time" on the primary is 128333 ms which means that the 121825 ms of waits for the RFS writes are fully acceptable (normal I/O operations/performance)

During these 3 days while the database was up and running having waits of about 120000 to 130000 ms (120 to 130 s) is absolutely negligible!

As a conclusion: although the most important waits on the primary database are related to the network, there are negligible in absolute values. 90% of 120 seconds of waits in 3 days is still only 1,66 second of wait per hour in average

More details on www.trivadis.com -> publications :

http://www.trivadis.com/Images/Dataguard_network_tuning_tvd_layout_tcm16-16758.pdf

 

Dataguard broker setup and Standby Redo Logs

You wanna create an Oracle Dataguard configuration working in Maximum availability. This configuration uses of course standby redo log files. These files must be available on the primary and standby side (because of switchover : primary database becomes standby database). In your setup process you decided to create the standy redo log files before duplicating the primary database. After the creation of the Dataguard configuration, the standby database seems to miss the standby redo log  files.

RFS[3]: No standby redo logfiles created
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_86_599565590.arc'

Below the description of the problem and the solution.

First of all you create the standby redo log files on the primary database :

alter database add standby logfile group 4
('/u00/oradata/DG/sredog4m1DG.dbf',
 '/u01/oradata/DG/sredog4m2DG.dbf') SIZE 6M reuse;
alter database add standby logfile group 5
('/u00/oradata/DG/sredog5m1DG.dbf',
 '/u01/oradata/DG/sredog5m2DG.dbf') SIZE 6M reuse;
alter database add standby logfile group 6
('/u00/oradata/DG/sredog6m1DG.dbf',
 '/u01/oradata/DG/sredog6m2DG.dbf') SIZE 6M reuse;
alter database add standby logfile group 7
('/u00/oradata/DG/sredog7m1DG.dbf',
 '/u01/oradata/DG/sredog7m2DG.dbf') SIZE 6M reuse;


Backup your primary database for standby with RMAN :

RMAN> connect target /
RMAN> backup incremental level 0 database;
RMAN> backup current controlfile for standby;
RMAN> backup archivelog all delete input;

Start (nomount) the standby side :

SQL> startup nomount;
ORACLE instance started.

From the primary server, start RMAN to duplicate for standby :

RMAN> connect target /
connect connected to target database: DG (DBID=1512896598)

RMAN> connect auxiliary sys/manager@DG_SITE2
connected to auxiliary database: DG (not mounted)

RMAN> duplicate target database for standby dorecover nofilenamecheck;

After the duplicate verify the status on the standby database :

SQL> select database_role from v$database;

DATABASE_ROLE
----------------
PHYSICAL STANDBY

The standby redo log files have been correctly duplicated, but the files are NOT created on the filesystem.

SQL> select group# from v$standby_log;

    GROUP#
----------
         4
         5
         6
         7

Now you can create your dataguard configuration with dgmgrl from the primary side :

connect sys/manager

create configuration 'DG' as
  primary database is 'DG_SITE1'
  connect identifier is 'DG_SITE1.trivadistraining.com';
add database 'DG_SITE2' as
  connect identifier is 'DG_SITE2.trivadistraining.com'
  maintained as physical;
edit database 'DG_SITE1' set property StandbyArchiveLocation='/u00/app/oracle/admin/DG/arch';
edit database 'DG_SITE1' set property StandbyFileManagement='AUTO';
edit database 'DG_SITE1' set property LogXptMode='ARCH';
edit database 'DG_SITE2' set property StandbyArchiveLocation='/u00/app/oracle/admin/DG/arch';
edit database 'DG_SITE2' set property StandbyFileManagement='AUTO';
edit database 'DG_SITE2' set property LogXptMode='ARCH';
enable configuration;
show configuration;

On the standby side , following occured (extract of the alert.log file of the standby database) :

Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE  THROUGH ALL SWITCHOVER DISCONNECT  NODELAY
Sun Sep 23 14:52:54 2007
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
RFS[3]: Assigned to RFS process 7535
RFS[3]: Identified database type as 'physical standby'
RFS[3]: No standby redo logfiles created
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_85_599565590.arc'
Sun Sep 23 14:52:55 2007
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_85_599565590.arc
Media Recovery Waiting for thread 1 sequence 86
Sun Sep 23 14:54:39 2007
RFS[3]: No standby redo logfiles created
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_86_599565590.arc'
RFS[3]: No standby redo logfiles created
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_87_599565590.arc'
Sun Sep 23 14:54:40 2007
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_86_599565590.arc
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_87_599565590.arc

After having enabled the Dataguard configuration, the standby redo log disappeared :

SQL> select group# from v$standby_log;

no rows selected

The standby database complains in the alert.log file :

RFS[3]: No standby redo logfiles created
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_86_599565590.arc'

The solution is easy, stop the log apply mode and create the standby redo log files on the standby database :

On the primary database (in dgmgrl) :

DGMGRL> edit database 'DG_SITE2' set state = 'LOG-APPLY-OFF';
Succeeded.

On standby side (see the script above) :

SQL> @create_standy_log.sql

Database altered.
Database altered.
Database altered.
Database altered.

On the primary database (in dgmgrl) acivate the standby site again :

DGMGRL> edit database 'DG_SITE2' set state = 'ONLINE';
Succeeded.

To verify, in sqlplus on the primary side :

SQL> alter system archive log current;

System altered.


On the standby side , the database doesn't complain anymore and uses the standby redo log files :

RFS[5]: Successfully opened standby log 4: '/u00/oradata/DG/sredog4m1DG.dbf'
Sun Sep 23 14:57:28 2007
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_91_599565590.arc
Media Recovery Waiting for thread 1 sequence 92

Note also that even if the mode is still Maximum Performance and the transport mode ARCH, the standby redo log files will be used to store the archived files from the primary.

xv - An image viewer on Ubuntu 7.04

Sometimes, when switching to new technologies/environments, it is still nice to be able to use some old software. Although there may be equivalents ;-) . In my case, a 6 year SuSE user, I just finished installing a really nice Linux variant on my laptop: Ubuntu 7.04. Although Ubuntu offers all the stuff an IT consultant requires, I missed one tool I really got used to: xv. To check and modify pictures and grab parts of the screen.




I could not find an xv package for Ubuntu, even after installing the nice extension coming along with automatix2 (http://www.getautomatix.com):
'apt-cache search xv' revelaed nothing...

After googling the internet a bit, I found some information about how to create an xv package on Ubuntu (for instance on http://bok.fas.harvard.edu/debian/xv/index.html). I try to consolidate them in this blog entry and hope you find the useful.

Before installing xv on Ubuntu, you will need some preparation. First of all, install the libraries required during the xv compilation and installation.

If your computer is behind a proxy, set the following variables to use the proxy:

export http_proxy=http://myproxy:8080
export ftp_proxy=http://myproxy:8080

Then install the required packages:

sudo apt-get install xlibs-dev
sudo apt-get install dpkg-dev
sudo apt-get install libc6-dev
sudo apt-get install libtiff3g-dev
sudo apt-get install libjpeg62-dev
sudo apt-get install libpng-dev zlib1g-dev
sudo apt-get install libtiff-dev
sudo apt-get install zlib-bin
sudo apt-get install zlibc
sudo apt-get install zlib1g-dev
sudo apt-get install zlib1g
sudo apt-get install libpng3
sudo apt-get install libpng12-dev
sudo apt-get install libpng12-0


After that, download the xv source, the jumbo patches and the debian patch (remember, Ubuntu is a Debian decendant):

ftp://ftp.cis.upenn.edu/pub/xv/xv-3.10a.tar.gz
http://prdownloads.sourceforge.net/png-mng/xv-3.10a-jumbo-patches-20050501.tar.gz
http://bok.fas.harvard.edu/debian/xv/xv-3.10a-jumbo20050501-1.diff.gz


During installation of xv, the bash shell will be needed. On Ubuntu 7.04, the default shell (/bin/sh) is linked to /bin/dash. Change that link so that it points to bash (There may be other solutions, but this worked for me most easily)

ls -lrt /bin/sh
lrwxrwxrwx 1 root root 4 2007-09-14 10:49 /bin/sh -> bash


No we start the real stuff:

tar xvzf xv-3.10a.tar.gz
tar xvzf xv-3.10a-jumbo-patches-20050501.tar.gz
gzip -d xv-3.10a-jumbo20050501-1.diff.gz
cd xv-3.10a


Apply the patches:

patch -p1 < ../xv-3.10a-jumbo-fix-patch-20050410.txt
patch -p1 < ../xv-3.10a-jumbo-enh-patch-20050501.txt
patch -p1 < ../xv-3.10a-jumbo20050501-1.diff
chmod 755 debian/rules


Build the package:

dpkg-buildpackage -rfakeroot -uc -b


Install the package:

sudo dpkg -i xv_3.10a-jumbo20050501-1_i386.deb
sudo dpkg -i xv-doc_3.10a-jumbo20050501-1_all.deb


That was it! I hope I could help you. If you do not want to build the packages by yourself, just send me a note.

Posted: Sep 19 2007, 06:59 von Yann Neuhaus | mit 4 comment(s)
Abgelegt unter: , ,