Recover from Lost Online Redo Logs with Data Guard

Mathias Zarick's Blog

Mathias is blogging here about Oracle and High Availability.

Recover from Lost Online Redo Logs with Data Guard

Rate This
  • Comments 2

Hi there!

In my last Data Guard course this week I gave my participants the task to recover from a situation where they lost the online redo log files of the primary database. They had a Maximum Availability Setup with Synchronous Log Transfer. Of course due to this setup there was no loss of any data when performing the failover to the standby site. The participants were able to failover the right way. Some also used an observer for doing this. The harder task after the failover was the reinstatement of the former primary, the one with the lost logs. Reinstatement means using it again as standby instantly and to save the time to set it up again. To let this work you need to have flashback enabled before.

In this blog entry I want to summarize the needed steps for the whole scenario.

Environment looked like this:
10.2.0.4.0, Linux x86_64, flashback is enabled on Primary and Standby

Data Guard Configuration Overview:

DGMGRL> connect sys@DG_SITE1
Password:
Connected.
DGMGRL> show configuration

Configuration
  Name:                DG
  Enabled:             YES
  Protection Mode:     MaxAvailability
  Fast-Start Failover: DISABLED
  Databases:
    DG_SITE1 - Primary database
    DG_SITE2 - Physical standby database

Current status for "DG":
SUCCESS

Step One – Lose the Online Logs at Primary
Oh yeah that’s the funny part! Big Smile

I will delete 'em, they are important. But of course i made them very important.Smile After a bunch of nonsense transactions all my three Online Log Groups are ACTIVE, which means they contain information not yet written down by Database Writer:

SQL> select group#, sequence#, members, status from v$log;

    GROUP#  SEQUENCE#    MEMBERS STATUS
---------- ---------- ---------- ----------------
         1         75          2 ACTIVE
         2         76          2 ACTIVE
         3         77          2 CURRENT


Now please go away :-)

select 'rm ' || member from v$logfile f, v$log l
where f.group# = l.group#;

SQL> select 'rm ' || member from v$logfile f, v$log l
  2  where f.group# = l.group#;

'RM'||MEMBER
-----------------------------------
rm /u00/oradata/DG/redog1m1DG.dbf
rm /u01/oradata/DG/redog1m2DG.dbf
rm /u00/oradata/DG/redog2m1DG.dbf
rm /u01/oradata/DG/redog2m2DG.dbf
rm /u00/oradata/DG/redog3m1DG.dbf
rm /u01/oradata/DG/redog3m2DG.dbf

6 rows selected.


After calling those 6 statements, I switched the logfiles, until the db was hanging. It was the third switch. The alert log shows a couple of such errors:

Fri Nov  6 10:58:45 2009
Errors in file /u00/app/oracle/admin/DG/bdump/dg_arc1_21328.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/u01/oradata/DG/redog3m2DG.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-00312: online log 3 thread 1: '/u00/oradata/DG/redog3m1DG.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3


Step Two – Failover
Okay now it is time to let our users work again on our standby site. So lets activate the standby.

DGMGRL> connect sys@DG_SITE2
Password:
Connected.
DGMGRL> failover to 'DG_SITE2'
Performing failover NOW, please wait...
Failover succeeded, new primary is "DG_SITE2"


I also shut down old primary with "shutdown abort".

That’s all. The users are happy and can connect to the new primary database. No data loss occurred due to the synchronous setup.


Step Three – Rebuild the Redundancy with Reinstatement

The Data Guard Configuration looks now like this:

DGMGRL> connect sys@DG_SITE2
Password:
Connected.
DGMGRL> show configuration;

Configuration
  Name:                DG
  Enabled:             YES
  Protection Mode:     MaxPerformance
  Fast-Start Failover: DISABLED
  Databases:
    DG_SITE1 - Physical standby database (disabled)
    DG_SITE2 - Primary database

Current status for "DG":
SUCCESS


Note: We are automatically in Max Performance Mode due to the fact that we are on a single instance solution for the moment.

Next thing is to restart the former primary in mount mode to get it reinstated. After this the broker says how to proceed:

DGMGRL> connect sys@DG_SITE1
Password:
Connected.
DGMGRL> show configuration;
Error: ORA-16795: database resource guard detects that database re-creation is required

Configuration details cannot be determined by DGMGRL

The broker demands the reinstatement of DG_SITE1, which would do the observer automatically. Let’s do this now:

DGMGRL> connect sys@DG_SITE2
Password:
Connected.
DGMGRL> reinstate database 'DG_SITE1';
Reinstating database "DG_SITE1", please wait...
Error: ORA-16653: failed to reinstate database

Failed.
Reinstatement of database "DG_SITE1" failed

The reinstatement failed, let's have a look at the alert log:

FLASHBACK DATABASE TO SCN 335290
Fri Nov  6 11:20:45 2009
Errors in file /u00/app/oracle/admin/DG/bdump/dg_rsm0_21445.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '/u01/oradata/DG/redog1m2DG.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-00312: online log 1 thread 1: '/u00/oradata/DG/redog1m1DG.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Fri Nov  6 11:20:45 2009
Errors in file /u00/app/oracle/admin/DG/bdump/dg_rsm0_21445.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/u01/oradata/DG/redog2m2DG.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-00312: online log 2 thread 1: '/u00/oradata/DG/redog2m1DG.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Fri Nov  6 11:20:45 2009
Errors in file /u00/app/oracle/admin/DG/bdump/dg_rsm0_21445.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/u01/oradata/DG/redog3m2DG.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-00312: online log 3 thread 1: '/u00/oradata/DG/redog3m1DG.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-38754 signalled during: FLASHBACK DATABASE TO SCN 335290...


The Log Message from drcDG.log which is located in the same directory is even more frustating:

DG 2009-11-06-11:20:45        0 2 0 Physical RSM: Reinstatement failed. Flashback database to SCN 335290 fails. If there is insufficient flashback log to flashback this database, reinstatement will never succeed. You will have to do a full physical standby re-instantiation of this database.

Okay flashback operation which was called by the reinstatement failed due to the missing logs.

Hmm, let's clear the logs:

SQL> alter database clear unarchived logfile group 1;

Database altered.

SQL> alter database clear unarchived logfile group 2;
alter database clear unarchived logfile group 2
*
ERROR at line 1:
ORA-01624: log 2 needed for crash recovery of instance DG (thread 1)
ORA-00312: online log 2 thread 1: '/u00/oradata/DG/redog2m1DG.dbf'
ORA-00312: online log 2 thread 1: '/u01/oradata/DG/redog2m2DG.dbf'


SQL> alter database clear unarchived logfile group 3;

Database altered.

Okay I could not clear logfile group 2 because it is know from controlfile that in any case it is needed to recover from it. And that's why also the flashback would still fail. So we need this redo log information. Hey, we should find it at our new primary, right!? Let's first query what we need:

SQL> select first_change# from v$log where group#=2;

FIRST_CHANGE#
-------------
       335153

On the new primary we search for this scn in the archived redo logs after a log switch:

SQL> select name from v$archived_log where FIRST_CHANGE#>=335153;

NAME
-----------------------------------------------------
/u00/app/oracle/admin/DG/arch/DG_1_79_702056159.arc
/u00/app/oracle/admin/DG/arch/DG_1_1_702212594.arc

So I copy those 2 logs from 2 different incarnations (cause of the resetlogs of the failover) from the new primary to the former primary and register it there with RMAN:

RMAN> catalog start with '/u00/app/oracle/admin/DG/arch';

using target database control file instead of recovery catalog
searching for all files that match the pattern /u00/app/oracle/admin/DG/arch

List of Files Unknown to the Database
=====================================
File Name: /u00/app/oracle/admin/DG/arch/DG_1_1_702212594.arc
File Name: /u00/app/oracle/admin/DG/arch/DG_1_79_702056159.arc

Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: /u00/app/oracle/admin/DG/arch/DG_1_1_702212594.arc
File Name: /u00/app/oracle/admin/DG/arch/DG_1_79_702056159.arc


Now I tried the flashback on the former primary again, I used the same statement which I found in the alert log before:

SQL> FLASHBACK DATABASE TO SCN 335290
  2  ;
FLASHBACK DATABASE TO SCN 335290
*
ERROR at line 1:
ORA-38754: FLASHBACK DATABASE not started; required redo log is not available
ORA-38761: redo log sequence 78 in thread 1, incarnation 6 could not be accessed

Nope, we need obviously more logs. So lets get all which are missing in comparison with the new primary:
In my case this were the logs DG_1_77_702056159.arc and DG_1_78_702056159.arc.

After registering them with the same method like above, we can try the flashback again:

SQL> FLASHBACK DATABASE TO SCN 335290;
FLASHBACK DATABASE TO SCN 335290
*
ERROR at line 1:
ORA-38770: FLASHBACK DATABASE failed during recovery.
ORA-00283: recovery session canceled due to errors
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/u01/oradata/DG/redog2m2DG.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-00312: online log 2 thread 1: '/u00/oradata/DG/redog2m1DG.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3


Ooops I forgot the clearing of logfile group 2, let's do this now and retry then:

SQL> alter database clear unarchived logfile group 2;

Database altered.

SQL> FLASHBACK DATABASE TO SCN 335290;

Flashback complete.

Cool, now we can reissue the reinstatement:

DGMGRL> connect sys@DG_SITE2
Password:
Connected.
DGMGRL> reinstate database 'DG_SITE1';
Reinstating database "DG_SITE1", please wait...
Operation requires shutdown of instance "DG" on database "DG_SITE1"
Shutting down instance "DG"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "DG" on database "DG_SITE1"
Starting instance "DG"...
ORACLE instance started.
Database mounted.
Continuing to reinstate database "DG_SITE1" ...
Reinstatement of database "DG_SITE1" succeeded
DGMGRL> show configuration

Configuration
  Name:                DG
  Enabled:             YES
  Protection Mode:     MaxPerformance
  Fast-Start Failover: DISABLED
  Databases:
    DG_SITE1 - Physical standby database
    DG_SITE2 - Primary database

Current status for "DG":
SUCCESS

Step Four – Check the Setup with a Switchover

Now we come to the end by reenabling Maximum Availability. Then we check everything by performing a switchover.

DGMGRL> edit configuration set protection mode as maxavailability;
Succeeded.
DGMGRL> show configuration

Configuration
  Name:                DG
  Enabled:             YES
  Protection Mode:     MaxAvailability
  Fast-Start Failover: DISABLED
  Databases:
    DG_SITE1 - Physical standby database
    DG_SITE2 - Primary database

Current status for "DG":
SUCCESS


I made some log switches at the new primary and checked the reception and recovery at the new standby (former primary).

SQL> select name, applied from v$archived_log
  2  where first_change# >=335153 order by first_change#;

NAME                                                    APP
------------------------------------------------------- ---
/u00/app/oracle/admin/DG/arch/DG_1_79_702056159.arc     NO
/u00/app/oracle/admin/DG/arch/DG_1_79_702056159.arc     YES
/u00/app/oracle/admin/DG/arch/DG_1_1_702212594.arc      NO
/u00/app/oracle/admin/DG/arch/DG_1_2_702212594.arc      NO
/u00/app/oracle/admin/DG/arch/DG_1_3_702212594.arc      NO
/u00/app/oracle/admin/DG/arch/DG_1_4_702212594.arc      NO
/u00/app/oracle/admin/DG/arch/DG_1_5_702212594.arc      NO
/u00/app/oracle/admin/DG/arch/DG_1_6_702212594.arc      NO
/u00/app/oracle/admin/DG/arch/DG_1_7_702212594.arc      NO

9 rows selected.

It seems that the standby receives the logs but does not recover in our new incarnation.. Hmm.
Alert Log says:

Fri Nov  6 11:54:26 2009
Managed Standby Recovery starting Real Time Apply
 parallel recovery started with 2 processes
Fri Nov  6 11:54:26 2009
Waiting for all non-current ORLs to be archived...
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_79_702056159.arc
Identified End-Of-Redo for thread 1 sequence 79
Fri Nov  6 11:54:26 2009
Incomplete Recovery applied until change 335292
MRP0: Background Media Recovery applied all available redo. Recovery will be restarted once new redo branch is registered
Fri Nov  6 11:54:26 2009
Errors in file /u00/app/oracle/admin/DG/bdump/dg_mrp0_21557.trc:
ORA-19906: recovery target incarnation changed during recovery
Managed Standby Recovery not using Real Time Apply
Recovery interrupted!
Fri Nov  6 11:54:27 2009
Errors in file /u00/app/oracle/admin/DG/bdump/dg_mrp0_21557.trc:
ORA-19906: recovery target incarnation changed during recovery

Let's fix this by creating a new standby controlfile. At the new primary:

SQL> alter database create standby controlfile as '/tmp/stbyctlDG.ctl';

Database altered.

I shut down (immediate) the new standby, copied the standby controlfile which I created right now to the right locations and started it again in mount mode.
Alert Log:

Fetching gap sequence in thread 1, gap sequence 1-7
Fri Nov  6 12:02:28 2009
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_1_702212594.arc'
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_2_702212594.arc'
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_3_702212594.arc'
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_4_702212594.arc'
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_5_702212594.arc'
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_6_702212594.arc'
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_7_702212594.arc'
Fri Nov  6 12:02:30 2009
Primary database is in MAXIMUM AVAILABILITY mode
Changing standby controlfile to RESYNCHRONIZATION level
RFS[1]: Successfully opened standby log 22: '/u00/oradata/DG/redog22m1DG.dbf'
Fri Nov  6 12:02:58 2009
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_1_702212594.arc
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_2_702212594.arc
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_3_702212594.arc
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_4_702212594.arc
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_5_702212594.arc
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_6_702212594.arc
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_7_702212594.arc
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_8_702212594.arc
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_9_702212594.arc
Media Recovery Waiting for thread 1 sequence 10 (in transit)
Fri Nov  6 12:02:59 2009
Recovery of Online Redo Log: Thread 1 Group 22 Seq 10 Reading mem 0
  Mem# 0: /u00/oradata/DG/redog22m1DG.dbf
  Mem# 1: /u01/oradata/DG/redog22m2DG.dbf


Same Query like above:
SQL> select name, applied from v$archived_log
  2  where first_change# >=335153 order by first_change#;

NAME                                                    APP
------------------------------------------------------- ---
/u00/app/oracle/admin/DG/arch/DG_1_1_702212594.arc      YES
/u00/app/oracle/admin/DG/arch/DG_1_2_702212594.arc      YES
/u00/app/oracle/admin/DG/arch/DG_1_3_702212594.arc      YES
/u00/app/oracle/admin/DG/arch/DG_1_4_702212594.arc      YES
/u00/app/oracle/admin/DG/arch/DG_1_5_702212594.arc      YES
/u00/app/oracle/admin/DG/arch/DG_1_6_702212594.arc      YES
/u00/app/oracle/admin/DG/arch/DG_1_7_702212594.arc      YES
/u00/app/oracle/admin/DG/arch/DG_1_8_702212594.arc      YES
/u00/app/oracle/admin/DG/arch/DG_1_9_702212594.arc      YES
/u00/app/oracle/admin/DG/arch/DG_1_10_702212594.arc     YES

10 rows selected.

What is with the flashback? Is it still switched on on my current standby database?
SQL> select flashback_on from v$database;

FLASHBACK_ON
------------------
NO


Okay, we fix this quickly:

DGMGRL> connect sys@DG_SITE1
Password:
Connected.
DGMGRL> edit database 'DG_SITE1' set state='LOG-APPLY-OFF';
Succeeded.

SQL> alter database flashback on;

Database altered.

DGMGRL> connect sys@DG_SITE1
Password:
Connected.
DGMGRL> edit database 'DG_SITE1' set state='ONLINE';
Succeeded.

Yes - now I'm happy and ready for the switchover which again changes the roles. But this time gracefully. It switches back the DG_SITE1 as a primary and DG_SITE2 as a standby database.

DGMGRL> switchover to 'DG_SITE1';
Performing switchover NOW, please wait...
Operation requires shutdown of instance "DG" on database "DG_SITE2"
Shutting down instance "DG"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires shutdown of instance "DG" on database "DG_SITE1"
Shutting down instance "DG"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "DG" on database "DG_SITE2"
Starting instance "DG"...
ORACLE instance started.
Database mounted.
Operation requires startup of instance "DG" on database "DG_SITE1"
Starting instance "DG"...
ORACLE instance started.
Database mounted.
Switchover succeeded, new primary is "DG_SITE1"
DGMGRL> show configuration;

Configuration
  Name:                DG
  Enabled:             YES
  Protection Mode:     MaxAvailability
  Fast-Start Failover: DISABLED
  Databases:
    DG_SITE1 - Primary database
    DG_SITE2 - Physical standby database

Current status for "DG":
SUCCESS

Everything is fine now again. And we saved the time for resetting up the standby from a backup or a clone of the new primary, which could have been essential if you think about TB databases.

Dear Reader!
Never lose your online redo log files! Never ever lose your online redo log files! Have fun with the big repertoire of possibilties in recovery scenarios with Data Guard!

Mathias

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Post
Page 1 of 1 (2 items)