Hi there!
In my last Data Guard course this week I gave my participants the task to recover from a situation where they lost the online redo log files of the primary database. They had a Maximum Availability Setup with Synchronous Log Transfer. Of course due to this setup there was no loss of any data when performing the failover to the standby site. The participants were able to failover the right way. Some also used an observer for doing this. The harder task after the failover was the reinstatement of the former primary, the one with the lost logs. Reinstatement means using it again as standby instantly and to save the time to set it up again. To let this work you need to have flashback enabled before.In this blog entry I want to summarize the needed steps for the whole scenario.Environment looked like this:10.2.0.4.0, Linux x86_64, flashback is enabled on Primary and StandbyData Guard Configuration Overview:DGMGRL> connect sys@DG_SITE1Password:Connected.DGMGRL> show configurationConfiguration Name: DG Enabled: YES Protection Mode: MaxAvailability Fast-Start Failover: DISABLED Databases: DG_SITE1 - Primary database DG_SITE2 - Physical standby databaseCurrent status for "DG":SUCCESSStep One – Lose the Online Logs at PrimaryOh yeah that’s the funny part!
I will delete 'em, they are important. But of course i made them very important. After a bunch of nonsense transactions all my three Online Log Groups are ACTIVE, which means they contain information not yet written down by Database Writer:SQL> select group#, sequence#, members, status from v$log; GROUP# SEQUENCE# MEMBERS STATUS---------- ---------- ---------- ---------------- 1 75 2 ACTIVE 2 76 2 ACTIVE 3 77 2 CURRENTNow please go away :-)select 'rm ' || member from v$logfile f, v$log lwhere f.group# = l.group#;SQL> select 'rm ' || member from v$logfile f, v$log l 2 where f.group# = l.group#;'RM'||MEMBER-----------------------------------rm /u00/oradata/DG/redog1m1DG.dbfrm /u01/oradata/DG/redog1m2DG.dbfrm /u00/oradata/DG/redog2m1DG.dbfrm /u01/oradata/DG/redog2m2DG.dbfrm /u00/oradata/DG/redog3m1DG.dbfrm /u01/oradata/DG/redog3m2DG.dbf6 rows selected.After calling those 6 statements, I switched the logfiles, until the db was hanging. It was the third switch. The alert log shows a couple of such errors:Fri Nov 6 10:58:45 2009Errors in file /u00/app/oracle/admin/DG/bdump/dg_arc1_21328.trc:ORA-00313: open failed for members of log group 3 of thread 1ORA-00312: online log 3 thread 1: '/u01/oradata/DG/redog3m2DG.dbf'ORA-27037: unable to obtain file statusLinux-x86_64 Error: 2: No such file or directoryAdditional information: 3ORA-00312: online log 3 thread 1: '/u00/oradata/DG/redog3m1DG.dbf'ORA-27037: unable to obtain file statusLinux-x86_64 Error: 2: No such file or directoryAdditional information: 3Step Two – FailoverOkay now it is time to let our users work again on our standby site. So lets activate the standby.DGMGRL> connect sys@DG_SITE2Password:Connected.DGMGRL> failover to 'DG_SITE2'Performing failover NOW, please wait...Failover succeeded, new primary is "DG_SITE2"I also shut down old primary with "shutdown abort".That’s all. The users are happy and can connect to the new primary database. No data loss occurred due to the synchronous setup.Step Three – Rebuild the Redundancy with ReinstatementThe Data Guard Configuration looks now like this:DGMGRL> connect sys@DG_SITE2Password:Connected.DGMGRL> show configuration;Configuration Name: DG Enabled: YES Protection Mode: MaxPerformance Fast-Start Failover: DISABLED Databases: DG_SITE1 - Physical standby database (disabled) DG_SITE2 - Primary databaseCurrent status for "DG":SUCCESSNote: We are automatically in Max Performance Mode due to the fact that we are on a single instance solution for the moment.Next thing is to restart the former primary in mount mode to get it reinstated. After this the broker says how to proceed:DGMGRL> connect sys@DG_SITE1Password:Connected.DGMGRL> show configuration;Error: ORA-16795: database resource guard detects that database re-creation is requiredConfiguration details cannot be determined by DGMGRLThe broker demands the reinstatement of DG_SITE1, which would do the observer automatically. Let’s do this now:DGMGRL> connect sys@DG_SITE2Password:Connected.DGMGRL> reinstate database 'DG_SITE1';Reinstating database "DG_SITE1", please wait...Error: ORA-16653: failed to reinstate databaseFailed.Reinstatement of database "DG_SITE1" failedThe reinstatement failed, let's have a look at the alert log:FLASHBACK DATABASE TO SCN 335290Fri Nov 6 11:20:45 2009Errors in file /u00/app/oracle/admin/DG/bdump/dg_rsm0_21445.trc:ORA-00313: open failed for members of log group 1 of thread 1ORA-00312: online log 1 thread 1: '/u01/oradata/DG/redog1m2DG.dbf'ORA-27037: unable to obtain file statusLinux-x86_64 Error: 2: No such file or directoryAdditional information: 3ORA-00312: online log 1 thread 1: '/u00/oradata/DG/redog1m1DG.dbf'ORA-27037: unable to obtain file statusLinux-x86_64 Error: 2: No such file or directoryAdditional information: 3Fri Nov 6 11:20:45 2009Errors in file /u00/app/oracle/admin/DG/bdump/dg_rsm0_21445.trc:ORA-00313: open failed for members of log group 2 of thread 1ORA-00312: online log 2 thread 1: '/u01/oradata/DG/redog2m2DG.dbf'ORA-27037: unable to obtain file statusLinux-x86_64 Error: 2: No such file or directoryAdditional information: 3ORA-00312: online log 2 thread 1: '/u00/oradata/DG/redog2m1DG.dbf'ORA-27037: unable to obtain file statusLinux-x86_64 Error: 2: No such file or directoryAdditional information: 3Fri Nov 6 11:20:45 2009Errors in file /u00/app/oracle/admin/DG/bdump/dg_rsm0_21445.trc:ORA-00313: open failed for members of log group 3 of thread 1ORA-00312: online log 3 thread 1: '/u01/oradata/DG/redog3m2DG.dbf'ORA-27037: unable to obtain file statusLinux-x86_64 Error: 2: No such file or directoryAdditional information: 3ORA-00312: online log 3 thread 1: '/u00/oradata/DG/redog3m1DG.dbf'ORA-27037: unable to obtain file statusLinux-x86_64 Error: 2: No such file or directoryAdditional information: 3ORA-38754 signalled during: FLASHBACK DATABASE TO SCN 335290...The Log Message from drcDG.log which is located in the same directory is even more frustating:DG 2009-11-06-11:20:45 0 2 0 Physical RSM: Reinstatement failed. Flashback database to SCN 335290 fails. If there is insufficient flashback log to flashback this database, reinstatement will never succeed. You will have to do a full physical standby re-instantiation of this database.Okay flashback operation which was called by the reinstatement failed due to the missing logs.Hmm, let's clear the logs:SQL> alter database clear unarchived logfile group 1;Database altered.SQL> alter database clear unarchived logfile group 2;alter database clear unarchived logfile group 2*ERROR at line 1:ORA-01624: log 2 needed for crash recovery of instance DG (thread 1)ORA-00312: online log 2 thread 1: '/u00/oradata/DG/redog2m1DG.dbf'ORA-00312: online log 2 thread 1: '/u01/oradata/DG/redog2m2DG.dbf'SQL> alter database clear unarchived logfile group 3;Database altered.Okay I could not clear logfile group 2 because it is know from controlfile that in any case it is needed to recover from it. And that's why also the flashback would still fail. So we need this redo log information. Hey, we should find it at our new primary, right!? Let's first query what we need:SQL> select first_change# from v$log where group#=2;FIRST_CHANGE#------------- 335153On the new primary we search for this scn in the archived redo logs after a log switch:SQL> select name from v$archived_log where FIRST_CHANGE#>=335153;NAME-----------------------------------------------------/u00/app/oracle/admin/DG/arch/DG_1_79_702056159.arc/u00/app/oracle/admin/DG/arch/DG_1_1_702212594.arcSo I copy those 2 logs from 2 different incarnations (cause of the resetlogs of the failover) from the new primary to the former primary and register it there with RMAN:RMAN> catalog start with '/u00/app/oracle/admin/DG/arch';using target database control file instead of recovery catalogsearching for all files that match the pattern /u00/app/oracle/admin/DG/archList of Files Unknown to the Database=====================================File Name: /u00/app/oracle/admin/DG/arch/DG_1_1_702212594.arcFile Name: /u00/app/oracle/admin/DG/arch/DG_1_79_702056159.arcDo you really want to catalog the above files (enter YES or NO)? yescataloging files...cataloging doneList of Cataloged Files=======================File Name: /u00/app/oracle/admin/DG/arch/DG_1_1_702212594.arcFile Name: /u00/app/oracle/admin/DG/arch/DG_1_79_702056159.arcNow I tried the flashback on the former primary again, I used the same statement which I found in the alert log before:SQL> FLASHBACK DATABASE TO SCN 335290 2 ;FLASHBACK DATABASE TO SCN 335290*ERROR at line 1:ORA-38754: FLASHBACK DATABASE not started; required redo log is not availableORA-38761: redo log sequence 78 in thread 1, incarnation 6 could not be accessedNope, we need obviously more logs. So lets get all which are missing in comparison with the new primary:In my case this were the logs DG_1_77_702056159.arc and DG_1_78_702056159.arc.After registering them with the same method like above, we can try the flashback again:
SQL> FLASHBACK DATABASE TO SCN 335290;FLASHBACK DATABASE TO SCN 335290*ERROR at line 1:ORA-38770: FLASHBACK DATABASE failed during recovery.ORA-00283: recovery session canceled due to errorsORA-00313: open failed for members of log group 2 of thread 1ORA-00312: online log 2 thread 1: '/u01/oradata/DG/redog2m2DG.dbf'ORA-27037: unable to obtain file statusLinux-x86_64 Error: 2: No such file or directoryAdditional information: 3ORA-00312: online log 2 thread 1: '/u00/oradata/DG/redog2m1DG.dbf'ORA-27037: unable to obtain file statusLinux-x86_64 Error: 2: No such file or directoryAdditional information: 3Ooops I forgot the clearing of logfile group 2, let's do this now and retry then:SQL> alter database clear unarchived logfile group 2;Database altered.SQL> FLASHBACK DATABASE TO SCN 335290;Flashback complete.Cool, now we can reissue the reinstatement:DGMGRL> connect sys@DG_SITE2Password:Connected.DGMGRL> reinstate database 'DG_SITE1';Reinstating database "DG_SITE1", please wait...Operation requires shutdown of instance "DG" on database "DG_SITE1"Shutting down instance "DG"...ORA-01109: database not openDatabase dismounted.ORACLE instance shut down.Operation requires startup of instance "DG" on database "DG_SITE1"Starting instance "DG"...ORACLE instance started.Database mounted.Continuing to reinstate database "DG_SITE1" ...Reinstatement of database "DG_SITE1" succeededDGMGRL> show configurationConfiguration Name: DG Enabled: YES Protection Mode: MaxPerformance Fast-Start Failover: DISABLED Databases: DG_SITE1 - Physical standby database DG_SITE2 - Primary databaseCurrent status for "DG":SUCCESSStep Four – Check the Setup with a SwitchoverNow we come to the end by reenabling Maximum Availability. Then we check everything by performing a switchover.
DGMGRL> edit configuration set protection mode as maxavailability;Succeeded.DGMGRL> show configurationConfiguration Name: DG Enabled: YES Protection Mode: MaxAvailability Fast-Start Failover: DISABLED Databases: DG_SITE1 - Physical standby database DG_SITE2 - Primary databaseCurrent status for "DG":SUCCESSI made some log switches at the new primary and checked the reception and recovery at the new standby (former primary).SQL> select name, applied from v$archived_log 2 where first_change# >=335153 order by first_change#;NAME APP------------------------------------------------------- ---/u00/app/oracle/admin/DG/arch/DG_1_79_702056159.arc NO/u00/app/oracle/admin/DG/arch/DG_1_79_702056159.arc YES/u00/app/oracle/admin/DG/arch/DG_1_1_702212594.arc NO/u00/app/oracle/admin/DG/arch/DG_1_2_702212594.arc NO/u00/app/oracle/admin/DG/arch/DG_1_3_702212594.arc NO/u00/app/oracle/admin/DG/arch/DG_1_4_702212594.arc NO/u00/app/oracle/admin/DG/arch/DG_1_5_702212594.arc NO/u00/app/oracle/admin/DG/arch/DG_1_6_702212594.arc NO/u00/app/oracle/admin/DG/arch/DG_1_7_702212594.arc NO9 rows selected.It seems that the standby receives the logs but does not recover in our new incarnation.. Hmm.Alert Log says:
Fri Nov 6 11:54:26 2009Managed Standby Recovery starting Real Time Apply parallel recovery started with 2 processesFri Nov 6 11:54:26 2009Waiting for all non-current ORLs to be archived...Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_79_702056159.arcIdentified End-Of-Redo for thread 1 sequence 79Fri Nov 6 11:54:26 2009Incomplete Recovery applied until change 335292MRP0: Background Media Recovery applied all available redo. Recovery will be restarted once new redo branch is registeredFri Nov 6 11:54:26 2009Errors in file /u00/app/oracle/admin/DG/bdump/dg_mrp0_21557.trc:ORA-19906: recovery target incarnation changed during recoveryManaged Standby Recovery not using Real Time ApplyRecovery interrupted!Fri Nov 6 11:54:27 2009Errors in file /u00/app/oracle/admin/DG/bdump/dg_mrp0_21557.trc:ORA-19906: recovery target incarnation changed during recoveryLet's fix this by creating a new standby controlfile. At the new primary:
SQL> alter database create standby controlfile as '/tmp/stbyctlDG.ctl';Database altered.I shut down (immediate) the new standby, copied the standby controlfile which I created right now to the right locations and started it again in mount mode.Alert Log:
Fetching gap sequence in thread 1, gap sequence 1-7Fri Nov 6 12:02:28 2009RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_1_702212594.arc'RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_2_702212594.arc'RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_3_702212594.arc'RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_4_702212594.arc'RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_5_702212594.arc'RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_6_702212594.arc'RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_7_702212594.arc'Fri Nov 6 12:02:30 2009Primary database is in MAXIMUM AVAILABILITY modeChanging standby controlfile to RESYNCHRONIZATION levelRFS[1]: Successfully opened standby log 22: '/u00/oradata/DG/redog22m1DG.dbf'Fri Nov 6 12:02:58 2009Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_1_702212594.arcMedia Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_2_702212594.arcMedia Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_3_702212594.arcMedia Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_4_702212594.arcMedia Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_5_702212594.arcMedia Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_6_702212594.arcMedia Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_7_702212594.arcMedia Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_8_702212594.arcMedia Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_9_702212594.arcMedia Recovery Waiting for thread 1 sequence 10 (in transit)Fri Nov 6 12:02:59 2009Recovery of Online Redo Log: Thread 1 Group 22 Seq 10 Reading mem 0 Mem# 0: /u00/oradata/DG/redog22m1DG.dbf Mem# 1: /u01/oradata/DG/redog22m2DG.dbfSame Query like above:SQL> select name, applied from v$archived_log 2 where first_change# >=335153 order by first_change#;NAME APP------------------------------------------------------- ---/u00/app/oracle/admin/DG/arch/DG_1_1_702212594.arc YES/u00/app/oracle/admin/DG/arch/DG_1_2_702212594.arc YES/u00/app/oracle/admin/DG/arch/DG_1_3_702212594.arc YES/u00/app/oracle/admin/DG/arch/DG_1_4_702212594.arc YES/u00/app/oracle/admin/DG/arch/DG_1_5_702212594.arc YES/u00/app/oracle/admin/DG/arch/DG_1_6_702212594.arc YES/u00/app/oracle/admin/DG/arch/DG_1_7_702212594.arc YES/u00/app/oracle/admin/DG/arch/DG_1_8_702212594.arc YES/u00/app/oracle/admin/DG/arch/DG_1_9_702212594.arc YES/u00/app/oracle/admin/DG/arch/DG_1_10_702212594.arc YES10 rows selected.What is with the flashback? Is it still switched on on my current standby database?SQL> select flashback_on from v$database;FLASHBACK_ON------------------NOOkay, we fix this quickly:DGMGRL> connect sys@DG_SITE1Password:Connected.DGMGRL> edit database 'DG_SITE1' set state='LOG-APPLY-OFF';Succeeded.SQL> alter database flashback on;Database altered.DGMGRL> connect sys@DG_SITE1Password:Connected.DGMGRL> edit database 'DG_SITE1' set state='ONLINE';Succeeded.Yes - now I'm happy and ready for the switchover which again changes the roles. But this time gracefully. It switches back the DG_SITE1 as a primary and DG_SITE2 as a standby database.DGMGRL> switchover to 'DG_SITE1';Performing switchover NOW, please wait...Operation requires shutdown of instance "DG" on database "DG_SITE2"Shutting down instance "DG"...ORA-01109: database not openDatabase dismounted.ORACLE instance shut down.Operation requires shutdown of instance "DG" on database "DG_SITE1"Shutting down instance "DG"...ORA-01109: database not openDatabase dismounted.ORACLE instance shut down.Operation requires startup of instance "DG" on database "DG_SITE2"Starting instance "DG"...ORACLE instance started.Database mounted.Operation requires startup of instance "DG" on database "DG_SITE1"Starting instance "DG"...ORACLE instance started.Database mounted.Switchover succeeded, new primary is "DG_SITE1"DGMGRL> show configuration;Configuration Name: DG Enabled: YES Protection Mode: MaxAvailability Fast-Start Failover: DISABLED Databases: DG_SITE1 - Primary database DG_SITE2 - Physical standby databaseCurrent status for "DG":SUCCESSEverything is fine now again. And we saved the time for resetting up the standby from a backup or a clone of the new primary, which could have been essential if you think about TB databases.Dear Reader!Never lose your online redo log files! Never ever lose your online redo log files! Have fun with the big repertoire of possibilties in recovery scenarios with Data Guard!Mathias
Ping Antwort von Blogroll Report 30/10/2009-06/11/2009 « Coskan’s Approach to Oracle
Echt klasse gemacht Mathias und sehr interessant. Ich hoffe wirklich wir werden in Kontakt bleiben.
Mfg
Yann