Hi there!
In my last Data Guard course this week I gave my participants the task to recover from a situation where they lost the online redo log files of the primary database. They had a Maximum Availability Setup with Synchronous Log Transfer. Of course due to this setup there was no loss of any data when performing the failover to the standby site. The participants were able to failover the right way. Some also used an observer for doing this. The harder task after the failover was the reinstatement of the former primary, the one with the lost logs. Reinstatement means using it again as standby instantly and to save the time to set it up again. To let this work you need to have flashback enabled before.
In this blog entry I want to summarize the needed steps for the whole scenario.
Environment looked like this:
10.2.0.4.0, Linux x86_64, flashback is enabled on Primary and Standby
Data Guard Configuration Overview:
DGMGRL> connect sys@DG_SITE1
Password:
Connected.
DGMGRL> show configuration
Configuration
Name: DG
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: DISABLED
Databases:
DG_SITE1 - Primary database
DG_SITE2 - Physical standby database
Current status for "DG":
SUCCESS
Step One – Lose the Online Logs at Primary
Oh yeah that’s the funny part!
I will delete 'em, they are important. But of course i made them very important.
After a bunch of nonsense transactions all my three Online Log Groups are ACTIVE, which means they contain information not yet written down by Database Writer:
SQL> select group#, sequence#, members, status from v$log;
GROUP# SEQUENCE# MEMBERS STATUS
---------- ---------- ---------- ----------------
1 75 2 ACTIVE
2 76 2 ACTIVE
3 77 2 CURRENT
Now please go away :-)
select 'rm ' || member from v$logfile f, v$log l
where f.group# = l.group#;
SQL> select 'rm ' || member from v$logfile f, v$log l
2 where f.group# = l.group#;
'RM'||MEMBER
-----------------------------------
rm /u00/oradata/DG/redog1m1DG.dbf
rm /u01/oradata/DG/redog1m2DG.dbf
rm /u00/oradata/DG/redog2m1DG.dbf
rm /u01/oradata/DG/redog2m2DG.dbf
rm /u00/oradata/DG/redog3m1DG.dbf
rm /u01/oradata/DG/redog3m2DG.dbf
6 rows selected.
After calling those 6 statements, I switched the logfiles, until the db was hanging. It was the third switch. The alert log shows a couple of such errors:
Fri Nov 6 10:58:45 2009
Errors in file /u00/app/oracle/admin/DG/bdump/dg_arc1_21328.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/u01/oradata/DG/redog3m2DG.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-00312: online log 3 thread 1: '/u00/oradata/DG/redog3m1DG.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Step Two – Failover
Okay now it is time to let our users work again on our standby site. So lets activate the standby.
DGMGRL> connect sys@DG_SITE2
Password:
Connected.
DGMGRL> failover to 'DG_SITE2'
Performing failover NOW, please wait...
Failover succeeded, new primary is "DG_SITE2"
I also shut down old primary with "shutdown abort".
That’s all. The users are happy and can connect to the new primary database. No data loss occurred due to the synchronous setup.
Step Three – Rebuild the Redundancy with Reinstatement
The Data Guard Configuration looks now like this:
DGMGRL> connect sys@DG_SITE2
Password:
Connected.
DGMGRL> show configuration;
Configuration
Name: DG
Enabled: YES
Protection Mode: MaxPerformance
Fast-Start Failover: DISABLED
Databases:
DG_SITE1 - Physical standby database (disabled)
DG_SITE2 - Primary database
Current status for "DG":
SUCCESS
Note: We are automatically in Max Performance Mode due to the fact that we are on a single instance solution for the moment.
Next thing is to restart the former primary in mount mode to get it reinstated. After this the broker says how to proceed:
DGMGRL> connect sys@DG_SITE1
Password:
Connected.
DGMGRL> show configuration;
Error: ORA-16795: database resource guard detects that database re-creation is required
Configuration details cannot be determined by DGMGRL
The broker demands the reinstatement of DG_SITE1, which would do the observer automatically. Let’s do this now:
DGMGRL> connect sys@DG_SITE2
Password:
Connected.
DGMGRL> reinstate database 'DG_SITE1';
Reinstating database "DG_SITE1", please wait...
Error: ORA-16653: failed to reinstate database
Failed.
Reinstatement of database "DG_SITE1" failed
The reinstatement failed, let's have a look at the alert log:
FLASHBACK DATABASE TO SCN 335290
Fri Nov 6 11:20:45 2009
Errors in file /u00/app/oracle/admin/DG/bdump/dg_rsm0_21445.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '/u01/oradata/DG/redog1m2DG.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-00312: online log 1 thread 1: '/u00/oradata/DG/redog1m1DG.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Fri Nov 6 11:20:45 2009
Errors in file /u00/app/oracle/admin/DG/bdump/dg_rsm0_21445.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/u01/oradata/DG/redog2m2DG.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-00312: online log 2 thread 1: '/u00/oradata/DG/redog2m1DG.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Fri Nov 6 11:20:45 2009
Errors in file /u00/app/oracle/admin/DG/bdump/dg_rsm0_21445.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/u01/oradata/DG/redog3m2DG.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-00312: online log 3 thread 1: '/u00/oradata/DG/redog3m1DG.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-38754 signalled during: FLASHBACK DATABASE TO SCN 335290...
The Log Message from drcDG.log which is located in the same directory is even more frustating:
DG 2009-11-06-11:20:45 0 2 0 Physical RSM: Reinstatement failed. Flashback database to SCN 335290 fails. If there is insufficient flashback log to flashback this database, reinstatement will never succeed. You will have to do a full physical standby re-instantiation of this database.
Okay flashback operation which was called by the reinstatement failed due to the missing logs.
Hmm, let's clear the logs:
SQL> alter database clear unarchived logfile group 1;
Database altered.
SQL> alter database clear unarchived logfile group 2;
alter database clear unarchived logfile group 2
*
ERROR at line 1:
ORA-01624: log 2 needed for crash recovery of instance DG (thread 1)
ORA-00312: online log 2 thread 1: '/u00/oradata/DG/redog2m1DG.dbf'
ORA-00312: online log 2 thread 1: '/u01/oradata/DG/redog2m2DG.dbf'
SQL> alter database clear unarchived logfile group 3;
Database altered.
Okay I could not clear logfile group 2 because it is know from controlfile that in any case it is needed to recover from it. And that's why also the flashback would still fail. So we need this redo log information. Hey, we should find it at our new primary, right!? Let's first query what we need:
SQL> select first_change# from v$log where group#=2;
FIRST_CHANGE#
-------------
335153
On the new primary we search for this scn in the archived redo logs after a log switch:
SQL> select name from v$archived_log where FIRST_CHANGE#>=335153;
NAME
-----------------------------------------------------
/u00/app/oracle/admin/DG/arch/DG_1_79_702056159.arc
/u00/app/oracle/admin/DG/arch/DG_1_1_702212594.arc
So I copy those 2 logs from 2 different incarnations (cause of the resetlogs of the failover) from the new primary to the former primary and register it there with RMAN:
RMAN> catalog start with '/u00/app/oracle/admin/DG/arch';
using target database control file instead of recovery catalog
searching for all files that match the pattern /u00/app/oracle/admin/DG/arch
List of Files Unknown to the Database
=====================================
File Name: /u00/app/oracle/admin/DG/arch/DG_1_1_702212594.arc
File Name: /u00/app/oracle/admin/DG/arch/DG_1_79_702056159.arc
Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done
List of Cataloged Files
=======================
File Name: /u00/app/oracle/admin/DG/arch/DG_1_1_702212594.arc
File Name: /u00/app/oracle/admin/DG/arch/DG_1_79_702056159.arc
Now I tried the flashback on the former primary again, I used the same statement which I found in the alert log before:
SQL> FLASHBACK DATABASE TO SCN 335290
2 ;
FLASHBACK DATABASE TO SCN 335290
*
ERROR at line 1:
ORA-38754: FLASHBACK DATABASE not started; required redo log is not available
ORA-38761: redo log sequence 78 in thread 1, incarnation 6 could not be accessed
Nope, we need obviously more logs. So lets get all which are missing in comparison with the new primary:
In my case this were the logs DG_1_77_702056159.arc and DG_1_78_702056159.arc.
After registering them with the same method like above, we can try the flashback again:
SQL> FLASHBACK DATABASE TO SCN 335290;
FLASHBACK DATABASE TO SCN 335290
*
ERROR at line 1:
ORA-38770: FLASHBACK DATABASE failed during recovery.
ORA-00283: recovery session canceled due to errors
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/u01/oradata/DG/redog2m2DG.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-00312: online log 2 thread 1: '/u00/oradata/DG/redog2m1DG.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Ooops I forgot the clearing of logfile group 2, let's do this now and retry then:
SQL> alter database clear unarchived logfile group 2;
Database altered.
SQL> FLASHBACK DATABASE TO SCN 335290;
Flashback complete.
Cool, now we can reissue the reinstatement:
DGMGRL> connect sys@DG_SITE2
Password:
Connected.
DGMGRL> reinstate database 'DG_SITE1';
Reinstating database "DG_SITE1", please wait...
Operation requires shutdown of instance "DG" on database "DG_SITE1"
Shutting down instance "DG"...
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "DG" on database "DG_SITE1"
Starting instance "DG"...
ORACLE instance started.
Database mounted.
Continuing to reinstate database "DG_SITE1" ...
Reinstatement of database "DG_SITE1" succeeded
DGMGRL> show configuration
Configuration
Name: DG
Enabled: YES
Protection Mode: MaxPerformance
Fast-Start Failover: DISABLED
Databases:
DG_SITE1 - Physical standby database
DG_SITE2 - Primary database
Current status for "DG":
SUCCESS
Step Four – Check the Setup with a Switchover
Now we come to the end by reenabling Maximum Availability. Then we check everything by performing a switchover.
DGMGRL> edit configuration set protection mode as maxavailability;
Succeeded.
DGMGRL> show configuration
Configuration
Name: DG
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: DISABLED
Databases:
DG_SITE1 - Physical standby database
DG_SITE2 - Primary database
Current status for "DG":
SUCCESS
I made some log switches at the new primary and checked the reception and recovery at the new standby (former primary).
SQL> select name, applied from v$archived_log
2 where first_change# >=335153 order by first_change#;
NAME APP
------------------------------------------------------- ---
/u00/app/oracle/admin/DG/arch/DG_1_79_702056159.arc NO
/u00/app/oracle/admin/DG/arch/DG_1_79_702056159.arc YES
/u00/app/oracle/admin/DG/arch/DG_1_1_702212594.arc NO
/u00/app/oracle/admin/DG/arch/DG_1_2_702212594.arc NO
/u00/app/oracle/admin/DG/arch/DG_1_3_702212594.arc NO
/u00/app/oracle/admin/DG/arch/DG_1_4_702212594.arc NO
/u00/app/oracle/admin/DG/arch/DG_1_5_702212594.arc NO
/u00/app/oracle/admin/DG/arch/DG_1_6_702212594.arc NO
/u00/app/oracle/admin/DG/arch/DG_1_7_702212594.arc NO
9 rows selected.
It seems that the standby receives the logs but does not recover in our new incarnation.. Hmm.
Alert Log says:
Fri Nov 6 11:54:26 2009
Managed Standby Recovery starting Real Time Apply
parallel recovery started with 2 processes
Fri Nov 6 11:54:26 2009
Waiting for all non-current ORLs to be archived...
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_79_702056159.arc
Identified End-Of-Redo for thread 1 sequence 79
Fri Nov 6 11:54:26 2009
Incomplete Recovery applied until change 335292
MRP0: Background Media Recovery applied all available redo. Recovery will be restarted once new redo branch is registered
Fri Nov 6 11:54:26 2009
Errors in file /u00/app/oracle/admin/DG/bdump/dg_mrp0_21557.trc:
ORA-19906: recovery target incarnation changed during recovery
Managed Standby Recovery not using Real Time Apply
Recovery interrupted!
Fri Nov 6 11:54:27 2009
Errors in file /u00/app/oracle/admin/DG/bdump/dg_mrp0_21557.trc:
ORA-19906: recovery target incarnation changed during recovery
Let's fix this by creating a new standby controlfile. At the new primary:
SQL> alter database create standby controlfile as '/tmp/stbyctlDG.ctl';
Database altered.
I shut down (immediate) the new standby, copied the standby controlfile which I created right now to the right locations and started it again in mount mode.
Alert Log:
Fetching gap sequence in thread 1, gap sequence 1-7
Fri Nov 6 12:02:28 2009
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_1_702212594.arc'
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_2_702212594.arc'
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_3_702212594.arc'
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_4_702212594.arc'
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_5_702212594.arc'
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_6_702212594.arc'
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_7_702212594.arc'
Fri Nov 6 12:02:30 2009
Primary database is in MAXIMUM AVAILABILITY mode
Changing standby controlfile to RESYNCHRONIZATION level
RFS[1]: Successfully opened standby log 22: '/u00/oradata/DG/redog22m1DG.dbf'
Fri Nov 6 12:02:58 2009
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_1_702212594.arc
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_2_702212594.arc
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_3_702212594.arc
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_4_702212594.arc
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_5_702212594.arc
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_6_702212594.arc
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_7_702212594.arc
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_8_702212594.arc
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_9_702212594.arc
Media Recovery Waiting for thread 1 sequence 10 (in transit)
Fri Nov 6 12:02:59 2009
Recovery of Online Redo Log: Thread 1 Group 22 Seq 10 Reading mem 0
Mem# 0: /u00/oradata/DG/redog22m1DG.dbf
Mem# 1: /u01/oradata/DG/redog22m2DG.dbf
Same Query like above:
SQL> select name, applied from v$archived_log
2 where first_change# >=335153 order by first_change#;
NAME APP
------------------------------------------------------- ---
/u00/app/oracle/admin/DG/arch/DG_1_1_702212594.arc YES
/u00/app/oracle/admin/DG/arch/DG_1_2_702212594.arc YES
/u00/app/oracle/admin/DG/arch/DG_1_3_702212594.arc YES
/u00/app/oracle/admin/DG/arch/DG_1_4_702212594.arc YES
/u00/app/oracle/admin/DG/arch/DG_1_5_702212594.arc YES
/u00/app/oracle/admin/DG/arch/DG_1_6_702212594.arc YES
/u00/app/oracle/admin/DG/arch/DG_1_7_702212594.arc YES
/u00/app/oracle/admin/DG/arch/DG_1_8_702212594.arc YES
/u00/app/oracle/admin/DG/arch/DG_1_9_702212594.arc YES
/u00/app/oracle/admin/DG/arch/DG_1_10_702212594.arc YES
10 rows selected.
What is with the flashback? Is it still switched on on my current standby database?
SQL> select flashback_on from v$database;
FLASHBACK_ON
------------------
NO
Okay, we fix this quickly:
DGMGRL> connect sys@DG_SITE1
Password:
Connected.
DGMGRL> edit database 'DG_SITE1' set state='LOG-APPLY-OFF';
Succeeded.
SQL> alter database flashback on;
Database altered.
DGMGRL> connect sys@DG_SITE1
Password:
Connected.
DGMGRL> edit database 'DG_SITE1' set state='ONLINE';
Succeeded.
Yes - now I'm happy and ready for the switchover which again changes the roles. But this time gracefully. It switches back the DG_SITE1 as a primary and DG_SITE2 as a standby database.
DGMGRL> switchover to 'DG_SITE1';
Performing switchover NOW, please wait...
Operation requires shutdown of instance "DG" on database "DG_SITE2"
Shutting down instance "DG"...
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
Operation requires shutdown of instance "DG" on database "DG_SITE1"
Shutting down instance "DG"...
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "DG" on database "DG_SITE2"
Starting instance "DG"...
ORACLE instance started.
Database mounted.
Operation requires startup of instance "DG" on database "DG_SITE1"
Starting instance "DG"...
ORACLE instance started.
Database mounted.
Switchover succeeded, new primary is "DG_SITE1"
DGMGRL> show configuration;
Configuration
Name: DG
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: DISABLED
Databases:
DG_SITE1 - Primary database
DG_SITE2 - Physical standby database
Current status for "DG":
SUCCESS
Everything is fine now again. And we saved the time for resetting up the standby from a backup or a clone of the new primary, which could have been essential if you think about TB databases.
Dear Reader!
Never lose your online redo log files! Never ever lose your online redo log files! Have fun with the big repertoire of possibilties in recovery scenarios with Data Guard!
Mathias