This post was originally published on this site
In a previous blog post I have explained how to get the basic configuration from x$drc and display something like:
OBJECT_ID DATABASE INTENDED_STATE CONNECT_STRING ENABLED ROLE RECEIVE_FROM SHIP_TO FSFOVALIDITY STATUS --------- -------------- ----------------- ---------------------------- ------- -------- ------------ -------------- ------------ ------- 16842752 toolcdb1_site1 READ-WRITE-XPTON newbox01:1521/TOOLCDB1_SITE1 YES PRIMARY -N/A- toolcdb1_site2 2 SUCCESS 33619968 toolcdb1_site2 PHYSICAL-APPLY-ON newbox02:1521/TOOLCDB1_SITE2 YES PHYSICAL -UNKNOWN- -N/A- 1 SUCCESS
There are other possibilities, by using the DBMS_DRS PL/SQL package.
The package is quite rich. In order to get more details, I use CHECK_CONNECT to check the connectivity to the member databases:
PROCEDURE CHECK_CONNECT Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- MEMBER_NAME VARCHAR2 IN INSTANCE_NAME VARCHAR2 IN
example:
SQL> execute dbms_drs.check_connect('TOOLCDB1_SITE2','TOOLCDB1'); PL/SQL procedure successfully completed. SQL> execute dbms_drs.check_connect('TOOLCDB1_SITE3','TOOLCDB1'); BEGIN dbms_drs.check_connect('TOOLCDB1_SITE3','TOOLCDB1'); END; * ERROR at line 1: ORA-16596: member not part of the Oracle Data Guard broker configuration ORA-06512: at "SYS.DBMS_DRS", line 1851 ORA-06512: at line 1 SQL> execute dbms_drs.check_connect('TOOLCDB1_SITE2','TOOLCDB1'); BEGIN dbms_drs.check_connect('TOOLCDB1_SITE2','TOOLCDB1'); END; * ERROR at line 1: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor ORA-06512: at "SYS.DBMS_DRS", line 1851 ORA-06512: at line 1
In the first case I get no exceptions, that means that the database is reachable using the DGConnectIdentifier specified in the configuration (‘TOOLCDB1_SITE2’ is my database name in the configuration, it is NOT a TNS entry. I use EZConnect in my lab).
In the second case I specify a database that is not in the configuration.
In the third case, it looks like the database is down (no service), or the DGConnectIdentifier is not correct.
GET_PROPERTY_OBJ is useful to get e single property of a database/instance:
FUNCTION GET_PROPERTY_OBJ RETURNS VARCHAR2 Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- OBJECT_ID NUMBER(38) IN PROPERTY_NAME VARCHAR2 IN
Example:
SQL> SELECT numtodsinterval(dbms_drs.get_property_obj(16842752,'TransportLagThreshold'),'second') > FROM dual; NUMTODSINTERVAL(DBMS_DRS.GET_PROPERTY_OBJ(16842752,'TRANSPORTLAGTHRESHOLD') --------------------------------------------------------------------------- +000000000 00:00:30.000000000
Here I have, for the primary (the object_id from x$drc), a TransportLagThreshold of 30 seconds.
DO_CONTROL does a specific check and returns a document with the results:
PROCEDURE DO_CONTROL Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- INDOC VARCHAR2 IN OUTDOC VARCHAR2 OUT REQUEST_ID NUMBER(38) IN/OUT PIECE NUMBER(38) IN CONTEXT VARCHAR2 IN DEFAULT
The problem is… what’s the format for indoc?
To get the correct format, I have enabled sql trace to get the executions, with bind variables, of the dgmgrl commands. It happens that the input format is XML and the output format is HTML.
This is how you can get the LogXptStatus, for example:
SQL> r 1 DECLARE 2 -- variables for the dbms_drs.do_control 3 v_indoc VARCHAR2 ( 4000 ); 4 v_outdoc VARCHAR2 ( 4000 ); 5 v_rid NUMBER; 6 v_context VARCHAR2(100); 7 v_pieceno NUMBER ; 8 BEGIN 9 v_indoc := '<DO_MONITOR version="19.1"><PROPERTY name="LogXptStatus" object_id="16842752"/></DO_MONITOR>'; 10 v_pieceno := 1; 11 dbms_drs.do_control(v_indoc, v_outdoc, v_rid, v_pieceno, v_context); 12 dbms_output.put_line (v_outdoc); 13* END; <TABLE name="LOG TRANSPORT STATUS"><DESCRIPTION ><COLUMN name="PRIMARY_INSTANCE_NAME" type="string" max_length="22"></COLUMN><COLUMN name="STANDBY_DATABASE_NAME" type="string" max_length="31"></COLUMN><COLUMN name="STATUS" type="string" max_length="10"></COLUMN><COLUMN name="ERROR" type="string" max_length="256"></COLUMN></DESCRIPTION><TR ><TD >TOOLCDB1</TD><TD >toolcdb1_site2</TD><TD >VALID</TD><TD ></TD></TR></TABLE>
The big script
So I said… why not trying to have a comprehensive SQL script to check a few vital statuses of Data Guard?
This is the script that came out:
-- -- Author : Ludovico Caldara -- Version : 0.1 -- Purpose : Checks the health of a Data Guard configuration on ONE database -- Run as : SYSDBA , to execute on each DB in the config -- it does not check ALL the DBs in the configuration but only the current one -- You can use a wrapper to check all the DBs in the configuration -- or all the standby instances on a server -- Limitations: Does not work on RAC environments yet -- Does not work on 11g databases (tested on 19c) set serveroutput on set lines 200 DECLARE v_dgconfig BINARY_INTEGER; v_num_errors BINARY_INTEGER; v_num_warnings BINARY_INTEGER; v_apply_lag INTERVAL DAY TO SECOND; v_transport_lag INTERVAL DAY TO SECOND; v_apply_th INTERVAL DAY TO SECOND; v_transport_th INTERVAL DAY TO SECOND; v_delay INTERVAL DAY TO SECOND; v_delaymins BINARY_INTEGER; v_flashback v$database.flashback_on%type; CURSOR c_dgconfig IS SELECT piv.*, obj.status FROM ( SELECT object_id, attribute, value FROM x$drc WHERE object_id IN ( SELECT object_id FROM x$drc WHERE attribute = 'DATABASE') ) drc PIVOT ( MAX ( value ) FOR attribute IN ( 'DATABASE' DATABASE , 'intended_state' intended_state , 'connect_string' connect_string , 'enabled' enabled , 'role' role , 'receive_from' receive_from , 'ship_to' ship_to , 'FSFOTargetValidity' FSFOTargetValidity ) ) piv JOIN x$drc obj ON ( obj.object_id = piv.object_id AND obj.attribute = 'DATABASE' ) WHERE upper(piv.database)=sys_context('USERENV','DB_UNIQUE_NAME'); CURSOR c_priconfig IS SELECT piv.*, obj.status FROM ( SELECT object_id, attribute, value FROM x$drc WHERE object_id IN ( SELECT object_id FROM x$drc WHERE attribute = 'DATABASE') ) drc PIVOT ( MAX ( value ) FOR attribute IN ( 'DATABASE' DATABASE , 'intended_state' intended_state , 'connect_string' connect_string , 'enabled' enabled , 'role' role , 'receive_from' receive_from , 'ship_to' ship_to , 'FSFOTargetValidity' FSFOTargetValidity ) ) piv JOIN x$drc obj ON ( obj.object_id = piv.object_id AND obj.attribute = 'DATABASE' ) WHERE piv.role='PRIMARY'; r_dgconfig c_dgconfig%ROWTYPE; r_priconfig c_priconfig%ROWTYPE; v_open_mode v$database.open_mode%TYPE; -- variables for the dbms_drs.do_control v_indoc VARCHAR2 ( 4000 ); v_outdoc VARCHAR2 ( 4000 ); v_rid NUMBER; v_context VARCHAR2(100); v_pieceno NUMBER ; /* xmltype does not work on mounted databases v_y CLOB; v_z XMLTYPE; v_xml XMLTYPE; */ v_status VARCHAR2(100); v_error VARCHAR2(100); v_p_connect BINARY_INTEGER; v_s_connect BINARY_INTEGER; v_offline_datafiles BINARY_INTEGER; BEGIN v_num_errors := 0; v_num_warnings := 0; v_p_connect := 0; v_s_connect := 0; dbms_output.put_line('Checking Data Guard Configuration for '||sys_context('USERENV','DB_UNIQUE_NAME')); dbms_output.put_line('--------------------------------------'); -- get open_mode SELECT open_mode INTO v_open_mode FROM v$database; -- check if the configuration exists SELECT count(*) INTO v_dgconfig FROM x$drc; IF v_dgconfig = 0 THEN dbms_output.put_line('ERROR: Current database does not have a Data Guard config.'); v_num_errors := v_num_errors + 1; GOTO stop_checks; else dbms_output.put_line('___OK: Current database has a Data Guard config.'); END IF; -- fetch the current DB config in record OPEN c_dgconfig; BEGIN FETCH c_dgconfig INTO r_dgconfig; EXCEPTION WHEN NO_DATA_FOUND THEN dbms_output.put_line('ERROR: Current database does not have a Data Guard config.'); v_num_errors := v_num_errors + 1; GOTO stop_checks; END; -- fetch the primary DB config in record OPEN c_priconfig; BEGIN FETCH c_priconfig INTO r_priconfig; EXCEPTION WHEN NO_DATA_FOUND THEN dbms_output.put_line('ERROR: There is no primary database in the config?'); v_num_errors := v_num_errors + 1; GOTO stop_checks; END; -- enabled? IF r_dgconfig.enabled = 'YES' THEN dbms_output.put_line('___OK: Current database is enabled in Data Guard.'); ELSE dbms_output.put_line('ERROR: Current database is not enabled in Data Guard.'); v_num_errors := v_num_errors + 1; END IF; -- status SUCCESS? IF r_dgconfig.status = 'SUCCESS' THEN dbms_output.put_line('___OK: Data Guard status for the database is: '||r_dgconfig.status); ELSE dbms_output.put_line('ERROR: Data Guard status for the database is: '||r_dgconfig.status); v_num_errors := v_num_errors + 1; END IF; -- reachability of the primary BEGIN dbms_drs.CHECK_CONNECT (r_priconfig.database ,r_priconfig.database); dbms_output.put_line('___OK: Primary ('||r_priconfig.database||') is reachable.'); v_p_connect := 1; EXCEPTION WHEN OTHERS THEN dbms_output.put_line('ERROR: Primary ('||r_priconfig.database||') unreachable. Error code ' || SQLCODE || ': ' || SQLERRM); v_num_errors := v_num_errors + 1; END; -- if we are not on the primary, check the current database connectivity as well through the broker IF r_priconfig.object_id <> r_dgconfig.object_id THEN BEGIN dbms_drs.CHECK_CONNECT (r_dgconfig.database ,r_dgconfig.database); dbms_output.put_line('___OK: current DB ('||r_dgconfig.database||') is reachable.'); v_s_connect := 1; EXCEPTION WHEN OTHERS THEN dbms_output.put_line('ERROR: current DB ('||r_dgconfig.database||') unreachable. Error code ' || SQLCODE || ': ' || SQLERRM); v_num_errors := v_num_errors + 1; END; END IF; -- we check primary transport only if reachable IF v_p_connect = 1 THEN -- primary logxpt? v_indoc := '<DO_MONITOR version="19.1"><PROPERTY name="LogXptStatus" object_id="'||r_priconfig.object_id||'"/></DO_MONITOR>'; v_pieceno := 1; dbms_drs.do_control(v_indoc, v_outdoc, v_rid, v_pieceno, v_context); select regexp_substr(v_outdoc, '(<TD >)([[:alnum:]].*?)(</TD>)',1,3,'i',2) into v_status from dual; /* does not work on MOUNTED databases v_y := TO_CLOB ( v_outdoc ); v_z := XMLType ( v_y ); select xt.status , xt.error into v_status, v_error from xmltable ('/TABLE/TR' passing v_z columns status varchar2(100) PATH 'TD[3]', error varchar2(100) PATH 'TD[4]' ) xt ; */ IF v_status = 'VALID' THEN dbms_output.put_line('___OK: LogXptStatus of primary is VALID.'); ELSE dbms_output.put_line('ERROR: LogXptStatus of primary is '||nvl(v_status,'NULL')); v_num_errors := v_num_errors + 1; END IF; END IF; -- flashback? SELECT flashback_on into v_flashback FROM v$database; IF v_flashback = 'YES' THEN dbms_output.put_line('___OK: Flashback Logging is enabled.'); ELSE dbms_output.put_line('_WARN: Flashback Logging is disabled.'); v_num_warnings := v_num_warnings + 1; END IF; -- role? IF r_dgconfig.ROLE = 'PRIMARY' THEN dbms_output.put_line('___OK: The database is PRIMARY, skipping standby checks.'); GOTO stop_checks; ELSE dbms_output.put_line('___OK: The database is STANDBY, executing standby checks.'); END IF; -- intended state? IF r_dgconfig.intended_state = 'PHYSICAL-APPLY-ON' THEN dbms_output.put_line('___OK: The database intended state is APPLY-ON.'); ELSIF r_dgconfig.intended_state = 'PHYSICAL-APPLY-READY' THEN dbms_output.put_line('_WARN: The database intended state is APPLY-OFF.'); v_num_warnings := v_num_warnings + 1; ELSE dbms_output.put_line('ERROR: The database intended state is '||r_dgconfig.intended_state); v_num_errors := v_num_errors + 1; END IF; -- real time apply? IF v_open_mode = 'READ ONLY WITH APPLY' THEN dbms_output.put_line('_WARN: Real Time Apply is used.'); v_num_warnings := v_num_warnings + 1; ELSIF v_open_mode = 'MOUNTED' THEN dbms_output.put_line('___OK: The standby database is mounted.'); ELSE dbms_output.put_line('ERROR: The database open_mode is '||v_open_mode); v_num_errors := v_num_errors + 1; END IF; -- offline datafiles? BEGIN select count(distinct con_id) into v_offline_datafiles from v$recover_file where online_status='OFFLINE' group by con_id; dbms_output.put_line('ERROR: There are '||v_offline_datafiles||' OFFLINE datafiles'); v_num_errors := v_num_errors + 1; EXCEPTION WHEN NO_DATA_FOUND THEN dbms_output.put_line('___OK: There are no PDBs with OFFLINE datafiles'); END; -- we get the delay as well, so that we can compute the apply threshold in a more intelligent way than the broker... v_delaymins := dbms_drs.get_property_obj(r_dgconfig.object_id,'DelayMins'); v_delay := numtodsinterval(v_delaymins,'minute'); IF v_delaymins > 0 THEN dbms_output.put_line('_WARN: Standby delayed by '||v_delaymins||' minutes.'); v_num_warnings := v_num_warnings + 1; END IF; -- apply lag? v_apply_th := numtodsinterval(dbms_drs.get_property_obj(r_dgconfig.object_id,'ApplyLagThreshold'),'second'); BEGIN SELECT TO_DSINTERVAL(value) into v_apply_lag FROM v$dataguard_stats WHERE name='apply lag'; IF v_apply_lag > ( v_apply_th + v_delay ) THEN dbms_output.put_line('ERROR: apply lag is '||v_apply_lag); v_num_errors := v_num_errors + 1; ELSE dbms_output.put_line('___OK: apply lag is '||v_apply_lag); END IF; EXCEPTION WHEN OTHERS THEN dbms_output.put_line('ERROR: cannot determine apply lag.'); v_num_errors := v_num_errors + 1; END; -- transport lag? v_transport_lag := numtodsinterval(dbms_drs.get_property_obj(r_dgconfig.object_id,'TransportLagThreshold'),'second'); BEGIN SELECT TO_DSINTERVAL(value) into v_transport_lag FROM v$dataguard_stats WHERE name='transport lag'; IF v_transport_lag > v_transport_th THEN dbms_output.put_line('ERROR: transport lag is '||v_transport_lag); v_num_errors := v_num_errors + 1; ELSE dbms_output.put_line('___OK: transport lag is '||v_transport_lag); END IF; EXCEPTION WHEN OTHERS THEN dbms_output.put_line('_WARN: cannot determine transport lag.'); v_num_warnings := v_num_warnings + 1; END; <<stop_checks>> dbms_output.put_line('--------------------------------------'); IF v_num_errors > 0 THEN dbms_output.put_line('RESULT: ERROR: '||to_char(v_num_errors)||' errors - '||to_char(v_num_warnings)||' warnings'); ELSIF v_num_warnings > 0 THEN dbms_output.put_line('RESULT: _WARN: '||to_char(v_num_errors)||' errors - '||to_char(v_num_warnings)||' warnings'); ELSE dbms_output.put_line('RESULT: ___OK: '||to_char(v_num_errors)||' errors - '||to_char(v_num_warnings)||' warnings'); END IF; END; /
Of course, it is not perfect (many checks missing: FSFO readiness, observer checks, etc.), but it is good enough for base monitoring. Also, it’s faster than a normal shell+dgmgrl script.
Output on a Primary database:
SQL> @check_dg_config Checking Data Guard Configuration for TOOLCDB1_SITE1 -------------------------------------- ___OK: Current database has a Data Guard config. ___OK: Current database is enabled in Data Guard. ___OK: Data Guard status for the database is: SUCCESS ___OK: Primary (toolcdb1_site1) is reachable. ___OK: LogXptStatus of primary is VALID. ___OK: Flashback Logging is enabled. ___OK: The database is PRIMARY, skipping standby checks. -------------------------------------- RESULT: ___OK: 0 errors - 0 warnings PL/SQL procedure successfully completed.
Output on a standby database:
SQL> @check_dg_config.sql Checking Data Guard Configuration for TOOLCDB1_SITE2 -------------------------------------- ___OK: Current database has a Data Guard config. ___OK: Current database is enabled in Data Guard. ___OK: Data Guard status for the database is: SUCCESS ___OK: Primary (toolcdb1_site1) is reachable. ___OK: current DB (toolcdb1_site2) is reachable. ___OK: LogXptStatus of primary is VALID. ___OK: Flashback Logging is enabled. ___OK: The database is STANDBY, executing standby checks. ___OK: The database intended state is APPLY-ON. _WARN: Real Time Apply is used. ___OK: There are no PDBs with OFFLINE datafiles ___OK: apply lag is +00 00:00:00.000000 ___OK: transport lag is +00 00:00:00.000000 -------------------------------------- RESULT: _WARN: 0 errors - 1 warnings PL/SQL procedure successfully completed.
In case of errors (e.g. standby listener stopped), I would get:
SQL> @check_dg_config.sql Checking Data Guard Configuration for TOOLCDB1_SITE2 -------------------------------------- ___OK: Current database has a Data Guard config. ___OK: Current database is enabled in Data Guard. ___OK: Data Guard status for the database is: SUCCESS ___OK: Primary (toolcdb1_site1) is reachable. ERROR: current DB (toolcdb1_site2) unreachable. Error code -12541: ORA-12541: TNS:no listener ___OK: LogXptStatus of primary is VALID. ___OK: Flashback Logging is enabled. ___OK: The database is STANDBY, executing standby checks. ___OK: The database intended state is APPLY-ON. _WARN: Real Time Apply is used. ___OK: There are no PDBs with OFFLINE datafiles ___OK: apply lag is +00 00:00:00.000000 ___OK: transport lag is +00 00:00:00.000000 -------------------------------------- RESULT: ERROR: 1 errors - 1 warnings PL/SQL procedure successfully completed.
So easy to spot the error and use a shell wrapper to grep ^ERROR or similar.
Be careful, the script is not RAC aware, and it lacks some checks, so you might want to reuse it and extend it to fit your exact configuration.
Hope you like it!
—
Ludovico