From time to time this discussion starts by Customers but also inside my company (Trivadis J). I won't allow me to provide a definitive statement in the name of Trivadis. This would be anyway difficult because our consultants have sometimes different point of view, they are humans, not robots J. However I really would like to give you some information which could be useful for your Linux choice by deploying Oracle Based Architecture (RDBMS servers for instance). This blog entry has been posted after several mails exchanges within our Oracle/Linux specialists and this topic has been already presented during our "Enterprise Linux for databases" Technocircle. Of course this post only focuses on the currently Oracle Certified Linux releases like :
- SuSE Linux Enterprise (SLES)
- Red Hat Enterprise Linux (RHEL)
- Oracle Enterprise/Unbreakable Linux (OEL)
and not on other distributions like Ubuntu, Slackware, aso ...
The several Linux distributions provide different advantages and drawbacks. Should one of this advantage be very important (critical) for you, you should choose this distribution. However if you find the same amount of advantages on all the available distributions, you'll have to valuate these drawback/advantages to identify the most important ones for you (i.e easy network administration, support of some filesystems, aso ...)
Please also considerate the available know-how in your organization, why choose Red Hat if you already have several very good SuSE administrators in your team, or reverse, of course ...
The following lists will present the Oracle-related advantage and drawbacks offered by each Oracle distribution.
SuSE Linux Enterprise Server (SLES)
+ Very large package repository
+ Centralized administration tool (YaST), allowing to solve a large part of the administration tasks: disk management (including Logical Volume Manager), User handling, Network configuration, and so on ...
+ SLES is by default delivered with OCFS2, the Oracle Cluster Filessystem. This could be useful for Failover clusters (Clusterware) or for RAC (Real Application Clusters) deployment
+ Heartbeat is provided by SuSE as a Failover cluster technology. Even if not officially supported by Oracle, Heartbeat offers a quite interesting protection level for other services like FTP servers, WEB servers, aso ...
- YAST uses it’s own configuration files, sometimes it overwrites self-made configurations
- SLES is not (anymore) the preferred/development platform for Oracle
Red Hat
+ On Red Hat 4, Oracle supports 9iR2, 10gR1, 10gR2 and 11gR1, which is not possible on SLES distributions
+ Satellite Server centralized update and package management tool
+ Very fast updates or patches in case of any vulnerability. Sometimes RedHat provides daily new updates
- Distribution has generally not the most recent available packages versions included
- Compared to SuSE, less software is included
- Red Hat cluster is not really on the quality level we expected. Some issues in the Oracle start stop procedures have been identified in Red Cluster 5 making the cohabitation of two Oracle instances in parallel on the same cluster impossible
- In parallel we had the "opportunity" to test the support for the Red Hat Cluster technology and we had to admit that it was far from our expectations J
- Unfortunately OCFS2 must be separately loaded and installed (through rpm) on Red Hat (directly included into SLES2)
Oracle Enterprise/Unbreakable Linux (OEL)
+ Same advantages than Red Hat, plus :
+ Well configured for Oracle, favoured by Oracle and current development platform for Oracle. All in one J
+ OCFS2 packages missing with Red Hat are already available, this makes Clusterware even more interesting
+ ASMLibs also directly available in OEL
+ Based on Red Hat (but with Oracle support, which could be a benefit compared to the Red Hat support)
+ Clusterware supported by Oracle as soon as OEL has been licensed. Clusterware come a serious competitor to Red Hat Cluster or Veritas cluster or even against pure Open Source solutions like Heartbeat and DRBD
+ Support/License pricing seems much cheaper
- Some extra (hardware provider) components could not be easily installed because this distribution is whether a Red Hat nor a SuSe and therefore unknown for some packages
Summary
For pure Oracle business, Oracle Enterprise Linux (and therefore "Clusterware for Free") seems to be the most interesting choice and Trivadis is also convinced that this would be the right Linux to deploy for Oracle based business. However do not underestimate the current Linux situation in your environment, never change a running system, if you are fully happy on SLES, use SLES, the same for Red Hat. Another issue could be that other applications beside Oracle are not OEL certified, therefore a withdraw to RHEL could be necessary.
The purpose of this small post is to explain a smooth way to change the archivelog destination within a running Data Guard environment. Such an operation is not so complex, however the goal will be to keep everything online and avoid the creation of unnecessary parameters...
Let's consider that Fast Start Failover is activated.
The Data Guard configuration is DB10TST2 and the two databases are DB10TST2_SITE1 and DB10TST2_SITE2. Currently the archive destination is /u90/archive/DB10TST2, it should be changed to /app/oracle/admin/DB10TST2/arch. This directory is the default archive log destination on primary but also the reception directory on the standby side.
Let's start the procedure.
First of all change the log_archive_dest_1 to the new destination on both the primary and standby databases. To make sure that the parameter is really considered, we change it for the instance "DB10TST2" and for "*".
ALTER SYSTEM SET log_archive_dest_1='location="/app/oracle/admin/DB10TST2/arch", valid_for=(ONLINE_LOGFILE,ALL_ROLES)' scope=both sid='*';
ALTER SYSTEM SET log_archive_dest_1='location="/app/oracle/admin/DB10TST2/arch", valid_for=(ONLINE_LOGFILE,ALL_ROLES)' scope=both sid='DB10TST2';
As we can see, the parameter log_archive_dest_1 can be changed online into the running instances. This change must be performed BEFORE the changes of the Data Guard property StandbyArchiveLocation. If not, log_archive_dest_1 and StandbyArchiveLocation will differ and the Data Guard broker will automatically create a parameter log_archive_dest_2 which is of course not wished.
Therefore in a second step, change the StandbyArchiveLocation for the both sites with dgmgrl:
edit database 'DB10TST2_SITE1' set property StandbyArchiveLocation = '/app/oracle/admin/DB10TST2/arch';
edit database 'DB10TST2_SITE2' set property StandbyArchiveLocation = '/app/oracle/admin/DB10TST2/arch';
Verify that all is OK and that no log_archive_dest2 appeared on the standby side, by checking the alert.log file of the both instances.
To be sure that Data Guard reacts correctly to this new configuration, it is possible to disable and enable the configuation :
DGMGRL> disable fast_start failover;
Disabled.
DGMGRL> disable configuration;
Disabled.
DGMGRL> enable configuration;
Enabled.
DGMGRL> enable fast_start failover;
Enabled.
DGMGRL> start observer;
Have fun with Data Guard.
Did you never fight with "ORA-01031: insufficient privileges" errors while trying to connect with "/ as sysdba" to your local database.
No jealousies, both Unix and Windows platforms could be concerned
. Without entering in the details, this small post will try to present some of the most common reasons of "ORA-01031: insufficient privileges" while trying to connect locally on a database.
Concerning UNIX
First of all, to avoid this error, the "oracle" user (i.e oracle) must be in the <OSDBA> group (dba). On some platforms, make sure that the dba group is really called "dba" (as advised in the documentation by the way) and not something like "osdba" for instance. See the Metalink Note 308151.1 about this topic. On HP-UX, Oracle 10.2.0.1 is delivered with an "hard coded" <OSDBA> group name in the file $ORACLE_HOME/rdbms/lib/config.c :
cat config.c
/* SS_DBA_GRP defines the UNIX group ID for sqldba adminstrative access. */
/* Refer to the Installation and User's Guide for further information. */ #define SS_DBA_GRP "dba"
#define SS_OPER_GRP "dba" char *ss_dba_grp[] = {SS_DBA_GRP, SS_OPER_GRP};
If the <OSDBA> group is not called "dba", the Oracle installation will be successful but the "connect /" will result in "ORA-01031: insufficient privileges" errors.To fix this situation you can :
- change "dba" to your <OSDBA> group name in config.c
- rename your <OSDBA> group to "dba" in /etc/group Another common reason for this kind of errors, could be that the ORACLE_HOME has been copied from another server to the using a simple scp command with the user oracle. As a consequence, all the sticky bits will be lost, for instance
/u00/app/oracle/product/rdbms10203/bin/ [rdbms10203] ls -l oracle
-rwxr-x--x 1 oracle dba 35471360 Aug 22 10:09 oracle instead of :
-rwsr-s--x 1 oracle dba 284361408 Aug 21 16:30 oracle
Or
/u00/app/oracle/product/rdbms10203/bin/ [rdbms10203] ls -lrt oradism
-r-sr-s--- 1 oracle dba 75648 Aug 22 10:10 oradism instead of :
-r-sr-s--- 1 root dba 75648 Apr 11 2007 oradism
It is possible to fix back the right permissions on an ORACLE_HOME copied as oracle (and not as root). Start the following scripts:
$ORACLE_HOME/bin/relink all
$ORACLE_HOME/root.sh (as root)
$ORACLE_HOME/install/changePerm.sh
Of course an ORACLE_HOME should be copied as root with the following statement :
tar cvzpf - {your_oracle_home} | ssh {remote_server} "(cd $PWD; tar xvzpf -)"
Note the "p" flag to preserve the permissions (sticky bits and root ownership).
Concerning the Windows platform
Please find below the main reasons why it is not possible to connect with "connect /" even if the ORACLE_SID is correctly set:The Windows user under which oracle is installed may not be added to the ORA_DBA group (this group is installed with the oracle software).The appropriate service entry is not set correctly in sqlnet.ora :
SQLNET.AUTHENTICATION_SERVICES=(NTS)
Also verify the registry entry for non-domain authorization (run regedit):
Set in HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE
as string value:
OSAUTH_PREFIX_DOMAIN=FALSE
Before you (re-)test the connection, reboot your system (just to be on the safe side Wink ...) On Oracle8, as precised by Siba Prasad the LOCAL variable (equivalent to the UNIX TWO_TASK variable) helps to connect to the local instance if setting the ORACLE_SID doesn't help. LOCAL avoids to precise the Oracle*Net alias while connecting to a database (LOCAL can be set to a tnsnames.ora entry). However doing so, the direct (without Oracle*Net) connection should still be fixed. See http://dbathoughts.blogspot.com/2008/03/using-local-environment-variable.html.
From time to time the Data Guard broker log file shows ORA-04031 errors. The primary site cannot connect to the standby site anymore.
DG 2008-04-09-23:31:44 0 2 0 RSM0: Failed to connect to remote database MYDB_SITE2. Error is ORA-04031
It seems that the available shared_pool is not sufficient on the standby site. Let's have a deeper look.
On the both sides (primary and standby) we have sga_target = sga_max_size = 800M :
SYS@MYDB > show parameter sga_
NAME TYPE VALUE
------------------ ----------- ------------
sga_max_size big integer 800M
sga_target big integer 800M
On the production server the current shared pool size is about 500 MB (automatically tuned by Oracle):
SYS@MYDB > @ssinipar
Wrote file temp.tmp
Enter value for parameter: shared_pool_size
Parameter Session
---------------------------- -------------
__shared_pool_size 494927872
shared_pool_size 100663296
However, on the standby server the current shared pool site is about 240 MB :
Parameter Session
-------------------------- -------------
__shared_pool_size 239075328
shared_pool_size 100663296
Furthermore, the standby database has less free shared pool than the primary database, it is also quite fragmented.
On standby:
POOL NAME BYTES
------------ ---------------- ----------
shared pool free memory 18534000
On primary:
POOL NAME BYTES
------------ ---------------- ----------
shared pool free memory 68049224
On the primary site a lot of resize operations have been performed by Oracle, it seems that this application has a higher parsing than I/O activity :
COMPONENT CURRENT_SIZE OPER_COUNT LAST_OPER_TIME
-------------------- ------------ ---------- -------------------
shared pool 494927872 142 10-APR-008 00:47:23
streams pool 4194304 43 09-APR-008 22:05:46
DEFAULT buffer cache 260046848 185 10-APR-008 00:47:23
The streams pool size can be explained by data pump, which is used to perform the logical backup on the primary side.
The database requires much more shared pool that database block cache. On the standby side only 46 (23+23) SGA resize operations took place, compared to the 370 (142 + 43 + 185) and more on the primary side.
COMPONENT CURRENT_SIZE OPER_COUNT LAST_OPER_TIME
------------------------- ------------ ---------- -------------------
shared pool 239075328 23 10-APR-008 15:11:31
DEFAULT buffer cache 520093696 23 10-APR-008 15:11:31
It is also very interesting to observe that the standby database handles much more database blocks than the primary database:
On the standby side:
select count(*) from v$bh;
COUNT(*)
----------
61748
On the primary database:
select count(*) from v$bh;
COUNT(*)
----------
30591
This can be explained by the fact that the standby database only performs recovery. The blocks are built from the standby redo log file, loaded into the buffer cache in order to be applied on the database which is in recovery mode.
However, the Data Guard broker and its processes also need memory in the shared pool to work. Unfortunately, because the standby database allocates much more database block cache memory than shared pool in the SGA, some ORA-4031 can occur.
Example statement on the standby site :
SELECT dest_id FROM v$archive_dest WHERE (target='LOCAL' OR target='PRIMARY') AND destination = '/export/oradata/MYDB/data99/archivelog'
AND (valid_type <> 'ONLINE_LOGFILE') AND (valid_role <> 'PRIMARY_ROLE') AND (dest_id <= 10) AND (dest_name not in ( select alternate from v
$archive_dest))
Several conclusions can be drawn from this analysis:
- After activation, the standby database will have a SGA configuration which may be in conflict with the application requirements (more cache size than shared pool)
- Oracle does not seem to transfer the SGA modifications from the primary to the standby site but sets the parameters to the requirements of each role (more cache size on the standby site)
- It is not a good idea to set a smaller SGA size on the standby side in order to save some memory space, this error will be amplified.
In some cases, flushing the shared pool could help:
SYS@MYDB > alter system flush shared_pool;
System altered.
A solution could be to set the shared_pool_size to a value corresponding to the lowest limit allowing to operate the standby database without (or with less) ORA-4031 errors. This low-limit seems to be at around 300-350 MB.
Until 10.2.0.3, the Data Guard (and broker) technology suffered from several weaknesses. One of the "highlights" was that no Virtual IP management was provided by Oracle. This has quite important consequences for the clients as soon as the primary server is down (no IP responding to the TCP requests e.g. of a connection build up). In this case, the failover from the first to the second address (in the address list of the TNS entry used by the client) could take a long time, depending on network/TCP timeouts. The RAC/Clusterware architecture solved this issue by introducing the Virtual IPs (VIP) which where always available (maybe moved to a surviving node).
Of course several solutions are available to overcome that problem, for instance managing your own VIP with self-made scripts (and all the subsequent risks, you try to fake a failover cluster!), combine Data Guard with Failover Cluster technology (e.g. Heartbeat under Linux, Veritas Cluster Services), and so on ...
All that solutions have as almost as many drawbacks as advantages, the most important are:
- How can we garantee that the VIP is always started on the Primary database side?
- How is it possible to prevent two Virtual IP running in parallel in the cluster?
However, one of the mentioned solutions is almost "mandatory" for Oracle clients before 10.2.0.3.
Starting with 10.2.0.3 on client and server side, Oracle proposes the OUTBOUNT_CONNECT_TIMEOUT parameter (not very well documented though). This option must be set in the sqlnet.ora file and defines the maximum number of seconds to be waited before trying to access the second, third etc. address in the ADDRESS_LIST, independent of TCP/IP timeouts. Examples:
In sqlnet.ora:
OUTBOUND_CONNECT_TIMEOUT=5
In tnsnames.ora on the client side or in the LDAP server:
PCMDB =
(DESCRIPTION =
(ADDRESS_LIST=
(ADDRESS = (PROTOCOL = TCP)(HOST = server1 )(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = server2 )(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = PCMDB_RW )
)
)
PCMDB_RW is a service created beforehand and activated through a "startup database" trigger (see end of the post). This makes sure, that the client can only connect to the primary side. If after 5 seconds (in our case), server1 is still unreachable, the client will try to connect to server2 (ignoring existing TCP/IP parameters).
Unfortunately, during our tests we observed that this parameter requires the remote_listener parameter to be set on the server side, at least for Linux/Sun(/... ?) clients. This for the non-error situation that a client first tries the standby database (and therefore should failover to the primary). For windows clients, patch 6038241 (10.2.0.3.7P) can be installed (only available for 32 Bit), see metalink note 342443.1. With this patch, the remote_listener parameter does not have to be.
So the Windows clients have a valid/stable solution using the patch.
The Linux clients however suffer from the side effects of the remote_listener parameter: The remote_listener parameter is used to register to other (remote) listeners in order to inform them about the current instance load. This means that in case of crash of the primary server and activation of the standby database, the service PCMDB_RW is known by the listener of the standby side as being offered by the new primary (ok) and the old primary (not ok)!
This situation may result in "ORA-12560: TNS:protocol adapter error" errors.
The solution is quite simple: the listener should be "reloaded" on the new primary side (old standby) when the standby database becomes the primary. This could be easily automated with a script and a DB_CHANGE_ROLE trigger. See the code below :
Script to reload the listener:
#!/bin/ksh
# Set the ORACLE environment variables (ORACLE_HOME, PATH, aso ...)
lsnrctl reload LISTENER_PCMDB
Below the trigger to reload the listener:
CREATE OR REPLACE TRIGGER RELOAD_LISTENER AFTER DB_ROLE_CHANGE ON DATABASE
BEGIN
dbms_scheduler.create_job(job_name=>'reload_lsn',
job_type=>'executable',job_action=>
'/u00/app/oracle/local/custom/bin/reload_lsn_PCMDB.ksh',
enabled=>TRUE);
END;
/
With this solution, both Linux and Windows clients do not require any VIP or Failover Cluster anymore to fully benefit from a Data Guard architecture. The only 'restriction' is to use 10.2.0.3 Oracle client releases. Even 10.2.0.2 does not work properly. Of course, wait for 10.2.0.4 and hope that the "problem" is fixed could be another solution :-)
Appendix : startup trigger for "_RW" service :
create or replace trigger service_trigger after
startup on database
declare
db_name varchar(512);
db_domain varchar(512);
database_role varchar(16);
service_names varchar(512);
instance_name varchar2(16);
begin
select value into service_names from v$parameter where name = 'service_names';
select value into instance_name from v$parameter where name = 'instance_name';
select value into db_name from v$parameter where name = 'db_name';
select value into db_domain from v$parameter where name = 'db_domain';
select database_role into database_role from v$database;
if database_role = 'PRIMARY' then
service_names := rtrim(service_names || ', ' || db_name || '_RW.' || db_domain, '.');
else
service_names := rtrim(service_names || ', ' || db_name || '_RO.' || db_domain, '.');
end if;
execute immediate 'alter system set service_names = ''' || service_names || ''' scope=memory sid=''' || instance_name || '''';
end;
/
This short post presents how to change online the Data Guard configuration files location (parameters dg_broker_config_fileX) where X is 1 or 2.
REM ================================================
REM renaming of the dg_broker_config_file parameters
REM=================================================
set head off
set feed off
set echo off
set verify off
set trimspool on
ACCEPT site_number CHAR PROMPT 'Enter site number <2: ' DEFAULT '2'
var filename_dg1 varchar2(2000)
var filename_dg2 varchar2(2000)
spool rename_dg_broker_config_file_do.sql
select 'alter system set dg_broker_start=false scope=memory'||chr(10)||'/'
from dual
/
begin
select '/u00/app/oracle/admin/'||name||'/pfile/dr1_'||name||'_SITE'||&site_number||'.dat' into :filename_dg1
from v$database;
select '/u00/app/oracle/admin/'||name||'/pfile/dr2_'||name||'_SITE'||&site_number||'.dat' into :filename_dg2
from v$database;
end;
/
select '!mv '||value||' '||:filename_dg1
from v$parameter
where name = 'dg_broker_config_file1'
/
select '!mv '||value||' '||:filename_dg2
from v$parameter
where name = 'dg_broker_config_file2'
/
select 'alter system set dg_broker_config_file1=''/u00/app/oracle/admin/'||name||'/pfile/dr1_'||name||'_SITE'||&site_number||
'.dat'''||chr(10)||'/'
from v$database
/
select 'alter system set dg_broker_config_file2=''/u00/app/oracle/admin/'||name||'/pfile/dr2_'||name||'_SITE'||&site_number||
'.dat'''||chr(10)||'/'
from v$database
/
select 'alter system set dg_broker_start=true scope=memory'||chr(10)||'/'
from dual
/
select chr(10) from dual;
spool off
set echo on
set feed on print :filename_dg1
print :filename_dg2
spool rename_dg_broker_config_file_do.log
@@rename_dg_broker_config_file_do.sql
spool off
We could consider that the script is not REALLY online because the Data Guard broker is stopped
, but this has no impact on the instance availability.
At a customer site (Oracle 10.2.0.3, AIX, CIO enabled), the following error occurred during some "datafile move" actions on the database . Note that this post only presents the solution on a real-life problem without knowing exactly if the issue was related to CIO, flashback or any Oracle or AIX “feature” :
- Database was previously in a Data Guard (DG) configuration
- Flashback is activated for the database
The performed actions were:
1. offline tablespace, test if the file can be copied (with a simple “cp” command). File was still locked (may be due to CIO).
2. online tablespace, an error occurs :
SQL> alter tablespace CONTRAINTESG2 online;
alter tablespace CONTRAINTESG2 online
*
ERROR at line 1:
ORA-01113: file 22 needs media recovery
ORA-01110: data file 22: '/export/oradata/G2DBR1/data01/contraintesg2_db_01.dbf'
Of course we directly tried to recover the datafile :
SQL> recover datafile 22;
ORA-00283: recovery session canceled due to errors
ORA-38727: FLASHBACK DATABASE requires a current control file.
No way ! Impossible to online the datafile. We also observed that the database datafiles aren't backed up since a while (we checked in the RMAN backup log file). However, the actual control file knows the datafiles (visible with "alter database backup controlfile to trace"). The controlfile is also in a correct state (CURRENT) :
SQL> select controlfile_type from v$database;
CONTROL
-------
CURRENT
It seems that during the “recover datafile”, Oracle tries to perform a flashback of the database and complains because it feels it is a BACKUP controlfile. The control file is however really a CURRENT controlfile. This problem is described in the note 438437.1. No restore point has been created :
SQL> select * from v$restore_point;
no rows selected
What we had to do to solve the problem ? :
1. Remove the flashback settings for the database.
SQL> shutdown immediate;
SQL> startup mount;
SQL> alter database flashback off;
SQL> alter database open;
SQL> r
1* select file#,status from v$datafile
FILE# STATUS
---------- -------
1 SYSTEM
2 ONLINE
...
21 ONLINE
22 RECOVER
23 RECOVER
24 ONLINE
24 rows selected.
2. Try to recover the datafile, an ORA-600 appeared :
SQL> recover datafile 22;
ORA-00283: recovery session canceled due to errors
ORA-00600: internal error code, arguments: [kccfhb_1], [4], [], [], [], [], [], []
3. Creation of a new controlfile with NORESETLOGS (statement coming from "alter database backup controlfile to trace")):
SQL> @/export/soft/oracle/admin/G2DBR1/udump/cr_ctl.sql
Control file created.
SYS> recover database;
SYS> alter database open;
All files are online :
SYS> select file#,status from v$datafile;
FILE# STATUS
---------- -------
1 SYSTEM
2 ONLINE
...
22 ONLINE
23 ONLINE
24 ONLINE
24 rows selected.
The last step is of course to create the temporary tablespaces 
Last update 14.01.2008 :
After some additionnal tests with the customer we observed that the following scenery makes a Data Guard configuration in a very dangerous situation:
On a Data Guard configuration, perform a failover, then a switchover to get the initial situation (after reinstate, of course, from the old primary).
Afterwards, perform backups with a RMAN catalog
Add new datafiles on the database
These last added datafiles won't be backed up by RMAN anymore !
Only a backup without catalog will consider and save these datafiles.
Performing a RESYNC catalog doesn't solve the problem, the catalog doesn't know these new datafiles.
The only solution is to re-create the control file ! A SR has been opened by the customer.
Managing RMAN backups with FRA (Flashback Recovery Area) is definitively not "out of the box". Consider the following situation: the RMAN backup are performed in the Flashback Recovery Area in a first step (before going to tape). In a second step, the backup are saved on tape from the FRA with the following policies : DAILY, WEEKLY, MONTHLY. Each policy has its own retention (5 days, 5 weeks, 13 months).
The used media manager is NetBackup from Veritas. The backup scripts look like:
Backup of the database:
run {
allocate channel bck_chan1 type disk;
backup as compressed backupset incremental level 1 tag inc1_daily_dbf database;
backup current controlfile for standby
tag inc1_daily_std_ctrl;
sql "alter system archive log current";
backup filesperset 300 archivelog all not backed up 2 times tag inc1_daily_arch;
backup current controlfile tag inc1_daily_cur_ctrl;
backup spfile tag inc1_daily_spfile;
sql "alter database backup controlfile to trace";
backup filesperset 300 archivelog until time 'SYSDATE - 3' not backed up 2 times delete input tag inc1_daily_arch;
release channel bck_chan1;
}
Backup of the FRA:
run
{
allocate channel bck_chan1 type 'SBT_TAPE';
send 'NB_ORA_CLASS=Net0WinOracleGP, NB_ORA_SERV= NetBackupServ, NB_ORA_SCHED=DaillySchedule,NB_ORA_CLIENT=MyClient';
backup recovery area;
release channel bck_chan1;
}
If you didn't setup any RMAN catalog house keeping from the begin you could be confronted with large crosscheck/delete times the first time. The goal is to remove the RMAN catalog entries because they were already deleted from the media manager (out of the retention period).
A way to optimize the crosscheck/delete procedure, is to perform a first crosscheck with the NetBackup parameters (an access to the tape media manager will be performed) : the backup no more existing on tape will be set to “EXPIRED”. In a second step delete the “EXPIRED” backup with a new channel without NetBackup channel parameters (no tape access required, this will be faster):
For instance, to delete all the backups older then 100 days:
allocate channel for maintenance type 'SBT_TAPE';
# Set the NetBackup parameters
send 'NB_ORA_CLASS=Net0WinOracleGP, NB_ORA_SERV=NetBackupServ, NB_ORA_SCHED=DaillySchedule,NB_ORA_CLIENT=myclient';
# Perform the crosscheck
crosscheck backup completed before 'sysdate - 100';
# This second allocate avoids any netbackup access because no NetBackup
# variables are set.
allocate channel for maintenance type 'SBT_TAPE';
# Delete the entries in the catalog :
delete force noprompt expired backup completed before 'sysdate - 100';
But the problem is not yet totally solved. Indeed some entries have been deleted from the catalog which MAY (!) be still available on the FRA ! This could be due to the fact that the FRA is very large. This can be easily verified by a simple “ls” command in the FRA :
ls -lrt /export/oradata/ORACLE_SID/data99/flash_recovery_area/ORACLE_SID_SITE1/backupset/
In fact Oracle doesn't delete the FRA backupset as long as space if available in the FRA. Also the RMAN command “delete “EXPIRED”” does NOT delete the FRA backupsets.
The consequence is very simple: on the next scheduled daily backup all the backupset still available in the FRA will be backed up again on the Media Manager. This could generate a bad surprise because the backup will suddenly take much more time (depending on the size of the FRA).
It is required to suppress the backupset manually from the FRA through the following command:
delete obsolete recovery window of 100 days device type disk;
The FRA seems to clear the old backup set automatically when space is required, but when lot of space is available in the FRA the backup remain and must be cleaned manually.
It is also worth to mention that this effect also takes place even if only one retention period has been defined. The only factor limiting the number of backup stored in the FRA is the place available and not the retention period of the RMAN backup strategy.
One of the numerous hidden FRA “feature”. However the benefit of the FRA in a Dataguard environment largely covers the several drawbacks.
In order to change the Dataguard configuration from MAXIMUM PERFORMANCE to MAXIMUM AVAILABILITY, Oracle doesn't require to bounce the databases anymore since 10.2.0.3.
In fact this is not always the case. The only way to verify if it will occur is:
1. change log transport mode
edit database 'MYDB_SITE1' set property LogXptMode = 'SYNC';
edit database 'MYDB_SITE2' set property LogXptMode = 'SYNC';
2. look for the following entry in the alert.log file of the PRIMARY database :
Destination LOG_ARCHIVE_DEST_2 is SYNCHRONIZED
If this string has been found, the Dataguard configuration can be changed smoothly (without database reboot) to MAXIMUM AVAILABILITY as described below :
edit configuration set protection mode as 'MaxAvailability';
Of course the standby redo log files must exist, see previous posts.
If the entry in the alert.log file has not been found a workaround could be to to set the LogXptMode property to 'ASYNC' and then back to 'SYNC'.
Many thanks to Hervé who was the first one to discover this by Trivadis :-)
My Grid Control console is often polluted by warnings about wait events of the network event class :

Is my Data Guard setup really suffering from poor performance?
I operate with the Maximum Availability mode.
We will investigate why the Grid Control warnings occur. The database is up since about 3 days (72 hours). First of all try to identify the wait classes which generated the most important wait times:
set lines 130
select * from V$SYSTEM_WAIT_CLASS
order by 5
/
WAIT_CLASS_ID WAIT_CLASS# WAIT_CLASS TOTAL_WAITS TIME_WAITED
------------- ----------- ------------------- ----------- -----------
4217450380 1 Application 1472 434
3875070507 4 Concurrency 1178 1338
3290255840 2 Configuration 1097 5250
3386400367 5 Commit 3512 6981
4166625743 3 Administrative 4906 111527
2000153315 7 Network 2549790 325016
1740759767 8 User I/O 1384276 427440
4108307767 9 System I/O 3151272 646415
1893977003 0 Other 251373 2207566
2723168908 6 Idle 3720999 639262672
In the database, after a performance alert occurred in Grid Control, we can identify the problem in V$WAITCLASSMETRIC_HISTORY :
select BEGIN_TIME, END_TIME, DBTIME_IN_WAIT, TIME_WAITED WAIT_COUNT
from V$WAITCLASSMETRIC_HISTORY
where WAIT_CLASS# = 7; # 7 for network
BEGIN_TIME END_TIME DBTIME_IN_WAIT TIME_WAITED WAIT_COUNT
-------------------- -------------------- -------------- ----------- ----------
25-SEP-2007 11:00:54 25-SEP-2007 11:01:54 21.0387822 2143.1923 19262
25-SEP-2007 10:59:54 25-SEP-2007 11:00:54 51.2713653 2147.2517 19047
25-SEP-2007 10:58:54 25-SEP-2007 10:59:54 55.8887214 2480.2726 18405
25-SEP-2007 10:57:53 25-SEP-2007 10:58:54 44.2017456 2374.9075 21223
25-SEP-2007 10:56:53 25-SEP-2007 10:57:53 33.1204214 2676.6979 28336
25-SEP-2007 10:55:53 25-SEP-2007 10:56:53 37.3184263 3498.7629 30965
25-SEP-2007 10:54:54 25-SEP-2007 10:55:53 27.779576 3130.5472 29403
25-SEP-2007 10:53:54 25-SEP-2007 10:54:54 26.117611 2752.5801 23430
Now we will try to analyze in depth the events mentioned in Grid control :
"Database Time Spent Waiting (%)" is at 88.26568 for event class "Network"
Let’s have a look at the primary site. According to the white paper presented in the Metalink note 387174.1, the event “LNS wait on SENDREQ” covers the sequence of the following actions (when a commit transaction is propagated to the standby database).
(Extract of the Metalink note:)
The RFS process receives the redo being sent by LNS and completes the I/O to the standby redo log (important to understand the analysis below).
V$SYSTEM_EVENT is the view which will allow us to identify the current waits events :
select EVENT, TOTAL_WAITS, TOTAL_TIMEOUTS, TIME_WAITED, AVERAGE_WAIT
from v$system_event
where event like '%LNS%'
or event like '%LGWR%'
/
EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT
------------------------- ----------- -------------- ----------- ------------
. . .
LNS wait on SENDREQ 125643 0 156673 1.25
. . .
On the standby side we perform almost the same select (a few seconds later) , to identify the LNS/RFS related wait times :
select EVENT, TOTAL_WAITS, TOTAL_TIMEOUTS, TIME_WAITED, AVERAGE_WAIT
from v$system_event
where event like '%RFS%'
order by 4
/
EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT
------------------------- ----------- -------------- ----------- ------------
. . .
RFS write 125270 0 121825 .97
. . .
We clearly observe that out of the 156673 ms waited because of "LNS wait on SENDREQ" (on the primary, see above for detailed description of this event), 121825 ms are due to the waits on writing to the standby redo log files on the standby side ("RFS write")
In the whole propagation/synchronization process between primary and standby the most important part is simply the writing on the standby redo log files (normal I/Os to keep the transactions on the standby side) and not the network as expected.
It is worth to mention that the "redo write time" on the primary is 128333 ms which means that the 121825 ms of waits for the RFS writes are fully acceptable (normal I/O operations/performance)
During these 3 days while the database was up and running having waits of about 120000 to 130000 ms (120 to 130 s) is absolutely negligible!
As a conclusion: although the most important waits on the primary database are related to the network, there are negligible in absolute values. 90% of 120 seconds of waits in 3 days is still only 1,66 second of wait per hour in average
More details on www.trivadis.com -> publications :
http://www.trivadis.com/Images/Dataguard_network_tuning_tvd_layout_tcm16-16758.pdf
You wanna create an Oracle Dataguard configuration working in Maximum availability. This configuration uses of course standby redo log files. These files must be available on the primary and standby side (because of switchover : primary database becomes standby database). In your setup process you decided to create the standy redo log files before duplicating the primary database. After the creation of the Dataguard configuration, the standby database seems to miss the standby redo log files.
RFS[3]: No standby redo logfiles created
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_86_599565590.arc'
Below the description of the problem and the solution.
First of all you create the standby redo log files on the primary database :
alter database add standby logfile group 4
('/u00/oradata/DG/sredog4m1DG.dbf',
'/u01/oradata/DG/sredog4m2DG.dbf') SIZE 6M reuse;
alter database add standby logfile group 5
('/u00/oradata/DG/sredog5m1DG.dbf',
'/u01/oradata/DG/sredog5m2DG.dbf') SIZE 6M reuse;
alter database add standby logfile group 6
('/u00/oradata/DG/sredog6m1DG.dbf',
'/u01/oradata/DG/sredog6m2DG.dbf') SIZE 6M reuse;
alter database add standby logfile group 7
('/u00/oradata/DG/sredog7m1DG.dbf',
'/u01/oradata/DG/sredog7m2DG.dbf') SIZE 6M reuse;
Backup your primary database for standby with RMAN :
RMAN> connect target /
RMAN> backup incremental level 0 database;
RMAN> backup current controlfile for standby;
RMAN> backup archivelog all delete input;
Start (nomount) the standby side :
SQL> startup nomount;
ORACLE instance started.
From the primary server, start RMAN to duplicate for standby :
RMAN> connect target /
connect connected to target database: DG (DBID=1512896598)
RMAN> connect auxiliary sys/manager@DG_SITE2
connected to auxiliary database: DG (not mounted)
RMAN> duplicate target database for standby dorecover nofilenamecheck;
After the duplicate verify the status on the standby database :
SQL> select database_role from v$database;
DATABASE_ROLE
----------------
PHYSICAL STANDBY
The standby redo log files have been correctly duplicated, but the files are NOT created on the filesystem.
SQL> select group# from v$standby_log;
GROUP#
----------
4
5
6
7
Now you can create your dataguard configuration with dgmgrl from the primary side :
connect sys/manager
create configuration 'DG' as
primary database is 'DG_SITE1'
connect identifier is 'DG_SITE1.trivadistraining.com';
add database 'DG_SITE2' as
connect identifier is 'DG_SITE2.trivadistraining.com'
maintained as physical;
edit database 'DG_SITE1' set property StandbyArchiveLocation='/u00/app/oracle/admin/DG/arch';
edit database 'DG_SITE1' set property StandbyFileManagement='AUTO';
edit database 'DG_SITE1' set property LogXptMode='ARCH';
edit database 'DG_SITE2' set property StandbyArchiveLocation='/u00/app/oracle/admin/DG/arch';
edit database 'DG_SITE2' set property StandbyFileManagement='AUTO';
edit database 'DG_SITE2' set property LogXptMode='ARCH';
enable configuration;
show configuration;
On the standby side , following occured (extract of the alert.log file of the standby database) :
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE THROUGH ALL SWITCHOVER DISCONNECT NODELAY
Sun Sep 23 14:52:54 2007
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
RFS[3]: Assigned to RFS process 7535
RFS[3]: Identified database type as 'physical standby'
RFS[3]: No standby redo logfiles created
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_85_599565590.arc'
Sun Sep 23 14:52:55 2007
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_85_599565590.arc
Media Recovery Waiting for thread 1 sequence 86
Sun Sep 23 14:54:39 2007
RFS[3]: No standby redo logfiles created
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_86_599565590.arc'
RFS[3]: No standby redo logfiles created
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_87_599565590.arc'
Sun Sep 23 14:54:40 2007
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_86_599565590.arc
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_87_599565590.arc
After having enabled the Dataguard configuration, the standby redo log disappeared :
SQL> select group# from v$standby_log;
no rows selected
The standby database complains in the alert.log file :
RFS[3]: No standby redo logfiles created
RFS[3]: Archived Log: '/u00/app/oracle/admin/DG/arch/DG_1_86_599565590.arc'
The solution is easy, stop the log apply mode and create the standby redo log files on the standby database :
On the primary database (in dgmgrl) :
DGMGRL> edit database 'DG_SITE2' set state = 'LOG-APPLY-OFF';
Succeeded.
On standby side (see the script above) :
SQL> @create_standy_log.sql
Database altered.
Database altered.
Database altered.
Database altered.
On the primary database (in dgmgrl) acivate the standby site again :
DGMGRL> edit database 'DG_SITE2' set state = 'ONLINE';
Succeeded.
To verify, in sqlplus on the primary side :
SQL> alter system archive log current;
System altered.
On the standby side , the database doesn't complain anymore and uses the standby redo log files :
RFS[5]: Successfully opened standby log 4: '/u00/oradata/DG/sredog4m1DG.dbf'
Sun Sep 23 14:57:28 2007
Media Recovery Log /u00/app/oracle/admin/DG/arch/DG_1_91_599565590.arc
Media Recovery Waiting for thread 1 sequence 92
Note also that even if the mode is still Maximum Performance and the transport mode ARCH, the standby redo log files will be used to store the archived files from the primary.
Sometimes, when switching to new technologies/environments, it is still
nice to be able to use some old software. Although there may be
equivalents ;-) . In my case, a 6 year SuSE user, I just finished
installing a really nice Linux variant on my laptop: Ubuntu 7.04.
Although Ubuntu offers all the stuff an IT consultant requires, I missed
one tool I really got used to: xv. To check and modify pictures and grab
parts of the screen.

I could not find an xv package for Ubuntu, even after installing the
nice extension coming along with automatix2 (http://www.getautomatix.com):
'apt-cache search xv' revelaed nothing...
After googling the internet a bit, I found some information about how to
create an xv package on Ubuntu (for instance on
http://bok.fas.harvard.edu/debian/xv/index.html). I try to consolidate
them in this blog entry and hope you find the useful.
Before installing xv on Ubuntu, you will need some preparation. First of
all, install the libraries required during the xv compilation and
installation.
If your computer is behind a proxy, set the following variables to use
the proxy:
export http_proxy=http://myproxy:8080
export ftp_proxy=http://myproxy:8080
Then install the required packages:
sudo apt-get install xlibs-dev
sudo apt-get install dpkg-dev
sudo apt-get install libc6-dev
sudo apt-get install libtiff3g-dev
sudo apt-get install libjpeg62-dev
sudo apt-get install libpng-dev zlib1g-dev
sudo apt-get install libtiff-dev
sudo apt-get install zlib-bin
sudo apt-get install zlibc
sudo apt-get install zlib1g-dev
sudo apt-get install zlib1g
sudo apt-get install libpng3
sudo apt-get install libpng12-dev
sudo apt-get install libpng12-0
After that, download the xv source, the jumbo patches and the debian
patch (remember, Ubuntu is a Debian decendant):
ftp://ftp.cis.upenn.edu/pub/xv/xv-3.10a.tar.gz
http://prdownloads.sourceforge.net/png-mng/xv-3.10a-jumbo-patches-20050501.tar.gz
http://bok.fas.harvard.edu/debian/xv/xv-3.10a-jumbo20050501-1.diff.gz
During installation of xv, the bash shell will be needed. On Ubuntu
7.04, the default shell (/bin/sh) is linked to /bin/dash. Change that
link so that it points to bash (There may be other solutions, but this
worked for me most easily)
ls -lrt /bin/sh
lrwxrwxrwx 1 root root 4 2007-09-14 10:49 /bin/sh -> bash
No we start the real stuff:
tar xvzf xv-3.10a.tar.gz
tar xvzf xv-3.10a-jumbo-patches-20050501.tar.gz
gzip -d xv-3.10a-jumbo20050501-1.diff.gz
cd xv-3.10a
Apply the patches:
patch -p1 < ../xv-3.10a-jumbo-fix-patch-20050410.txt
patch -p1 < ../xv-3.10a-jumbo-enh-patch-20050501.txt
patch -p1 < ../xv-3.10a-jumbo20050501-1.diff
chmod 755 debian/rules
Build the package:
dpkg-buildpackage -rfakeroot -uc -b
Install the package:
sudo dpkg -i xv_3.10a-jumbo20050501-1_i386.deb
sudo dpkg -i xv-doc_3.10a-jumbo20050501-1_all.deb
That was it! I hope I could help you. If you do not want to build the
packages by yourself, just send me a note.