This post was originally published on this site

In this blog post I mainly want to discuss the GoldenGate parameter HANDLECOLLISIONS and do some tests to find out about its reliability. I always felt very uncomfortable whenever I needed to use it. So this post is also about the desire for trust in the involved approach. Besides that I include some implementation details used in a replication setup from DB2 to Oracle.

About GoldenGate

GoldenGate is the Oracle’s ultimate solution for replication between various data sources. In most cases relational databases are in focus. But nowadays you can even replicate your data to Kafka, NoSQL databases, Hive, etc. If we stay with relational databases I want to highlight the fact that it can integrate different DBMS as source and/or target from different vendors, e.g. Oracle database, Microsoft SQL Server, MySQL, PostgreSQL, Teradata, IBM DB2 and more.

About DB2 and the Lack of Flashback Query

This post concentrates on replication from DB2 to Oracle database. Capture of changes is one aspect and fine. We just got some concerns about the question, how to make a consistent initial load. With Oracle database as a source this can be very easily accomplished, even if there is a high load on source database. The technique that is used to assure consistency is making use of flashback queries which get a consistent image of the queried data.

Later on the change data is applied as of this particular point in time. With DB2 there is no such feature which allows to read a table as of a specific time without locking.
The default DB2 isolation level is CS, cursor stability. In that level you see only committed data, but you can not be sure about the timeliness of that data. When you start a query at time x and finish at time y. The currentness of each individual read row can be anything between x and y. Of course this is only true for rows, that where changed during that time. With Oracle you would simply run a flashback query as of x. With DB2 we can’t. Please correct me if we can, this would be very useful!
A while ago I started a discussion on that on twitter: https://twitter.com/MathiasZarick/status/1499430322915659776
No better answer so far.

Create Appropriate Empty Schema on Oracle side

During the setup we need to create tables on Oracle side, which fit to our tables from source. I used the following queries on DB2 to get list of columns, information about primary keys and also secondary indexes. Make sure to have the primary keys on target tables.

To get everything as comfortable as possible I decided to install Oracle Gateway for ODBC to be able to query directly from Oracle side using DB links. Those links will also be of interest when we do the initial loads later. In my examples you will find a db link with the name ODBC which points to the DB2.

List of tables with its columns, restricted to schema DB2INST1:

SELECT tbcreator,
       tbname,
       name,
       colno,
       coltype,
       length,
       scale,
       nulls,
       keyseq
FROM [email protected]
WHERE tbcreator='DB2INST1'
ORDER BY tbcreator,
         tbname,
         colno

After you get the DB2 datatypes for your tables of interest you need to decide a suitable mapping to Oracle databases here. I chose:
VARCHAR2 for CHAR and VARCHAR.
NUMBER for SMALLINT, INTEGER, DECIMAL and FLOAT.
DATE for DATE and TIME.
TIMESTAMP for TIMESTMP.
You might need more mappings for other source datatypes and you might do the mapping differently as you need it.

List of PK columns per table (DB2 z/OS version):

SELECT indexes.name,
       indexes.creator,
       indexes.tbname,
       indexes.tbcreator,
       indexes.uniquerule,
       keys.colname,
       keys.colseq,
       keys.ordering
FROM [email protected] keys,
     [email protected] indexes
WHERE keys.ixname=indexes.name
AND keys.ixcreator = indexes.creator
AND keys.ixcreator='DB2INST1'
ORDER BY indexes.tbcreator,
         indexes.tbname,
         indexes.creator,
         indexes.name,
         colseq

List of PK columns per table (DB2 LUW version):

SELECT const.name,
       const.tbcreator,
       const.tbname,
       cols.name,
       cols.keyseq
FROM [email protected] const,
     [email protected] cols
WHERE const.tbcreator = cols.tbcreator
AND const.tbname = cols.tbname
AND const.tbcreator='DB2INST1'
AND const.constraintyp='P'
AND cols.keyseq != 0
ORDER BY cols.tbcreator,
         cols.tbname,
         cols.keyseq

The queries for secondary indexes – if you want them – query from sysibm.syskeys, sysibm.sysindexes for DB2 on z/OS, respectively from syscat.indexcoluse, syscat.indexes for DB2 on LUW. With the queries above we can create the target tables as desired. You might want to write a little program e.g. in PL/SQL to automate this task.

Setup Steps for Change Data Delivery

We will need GoldenGate for DB2 to run an extract and GoldenGate for Oracle to run a replicat. Luckily it is supported to run the extract as a remote extract, that means you don’t need to install it on the z/OS on mainframe but rather directly on the target database server and connect remotely via TCP/IP using a DB2 client.

DB2 admin needs to install certain libraries and stored procedures on DB2 z/OS side to get this running.
See https://docs.oracle.com/en/middleware/goldengate/core/21.1/installing/installing-db2-z-os.html

By the way, as of version 21 Oracle GoldenGate for DB2 is also available in the microservices architecture, which offers REST APIs to be used for deployment purposes. Until 19c it was only available for GoldenGate for Oracle, but now we can make use of it even with the other DBMSs. Anyway, my example still continues with some excerpts from classic architecture. The particular steps are quite similar with microservices
architecture.

Some DB2 client commands for cataloging for the setup in my lab:

db2 catalog tcpip node zam47 remote zam47 server 50000
db2 list node directory
db2 catalog database SAMPLE as SAMPLE at node zam47
db2 list database directory

With ggsci of Oracle GoldenGate for DB2 create an entry for GoldenGate connection and hide the password in a credentialstore/wallet:

ADD CREDENTIALSTORE
ALTER CREDENTIALSTORE ADD USER ogg PASSWORD <confidential> ALIAS [email protected]
INFO CREDENTIALSTORE
DBLOGIN SOURCEDB SAMPLE USERIDALIAS og[email protected]

For each table of interest you then run “ADD TRANDATA”, e.g.

ADD TRANDATA DB2INST1.EMPLOYEE
ADD TRANDATA DB2INST1.DEPARTMENT
ADD TRANDATA DB2INST1.LOADTEST

This runs an “ALTER TABLE DATA CAPTURE CHANGES;” on DB2 side enables the capability to see data changes of that table. Table loadtest is an example table where I will generate some DML stress later on to test initial load during changes on source.

Next is to create trails and processes:

EDIT PARAMS e_sa

Parameter file contents of extract:

EXTRACT E_SA
SOURCEDB SAMPLE, USERIDALIAS [email protected]
-- TRANSMEMORY DIRECTORY(./dirtmp, 20G, 50M), RAM 200M, TRANSRAM 100M, TRANSALLSOURCES 450M, RAMINCREMENT 5M
EXTTRAIL ./sa
REPORTCOUNT EVERY 2000 RECORDS, RATE
LOGALLSUPCOLS
UPDATERECORDFORMAT FULL
NOCOMPRESSUPDATES
NOCOMPRESSDELETES
TABLE DB2INST1.EMPLOYEE;
TABLE DB2INST1.DEPARTMENT;
TABLE DB2INST1.LOADTEST;

then:

ADD EXTRACT e_sa, TRANLOG, BEGIN NOW
ADD EXTTRAIL ./dirdat/sa, EXTRACT e_sa, MEGABYTES 100

START EXTRACT e_sa

With ggsci of Oracle GoldenGate for Oracle you can create replicat as follows:

EDIT PARAMS r_sa

Parameter file contents of replicat:

REPLICAT R_SA
USERIDALIAS c##[email protected]
DISCARDFILE ./dirrpt/R_SA.DSC, APPEND
DBOPTIONS INTEGRATEDPARAMS(PARALLELISM 2)
REPORTCOUNT EVERY 2000 RECORDS, RATE
MAP DB2INST1.*, TARGET FROMDB2.*;

then:

DBLOGIN USERIDALIAS c##[email protected]
ADD REPLICAT R_SA, INTEGRATED, EXTTRAIL ../gg19db2/dirdat/sa

Note, that both GoldenGate instances are running on same host and thus I do not need a pump process to transfer any trails. The replicat (Oracle GoldenGate for Oracle) directly accesses trail files written by extract process (Oracle GoldenGate for DB2).

You might start it with:

START REPLICAT r_sa

But as we do not have initial load yet it would probably fail right away, just wait with that step until later.

Initial Load

There are multiple possibilities for initial load. The challenge for the replication is that the data in source might always be changing. You could request a stop of these changes in the source for a while, but nowadays we expect have everything available as much as we can. Fortunately the solution for this is kind of easy (at least if Oracle is the source).

a) Get a consistency point from source. In GoldenGate this is called CSN Commit Sequence Number and with Oracle databases this is identical to SCN, system change number or system commit number. A CSN is something which monotonically increases all the time and is used to find consistency points.
For DB2 this is different, for DB2 on z/OS it is RBA, whereas RBA is the 6-byte relative byte address of the commit record within the transaction log, for DB2 on LUW it is derived from LRI (Log Record Identifier) and LSN (Log Sequence Number).
In Oracle you query it with:

SELECT current_scn FROM v$database

In DB2 LUW you do:

SELECT applid_holding_oldest_xact, oldest_tx_lsn, current_lsn
FROM table(MON_GET_TRANSACTION_LOG(-2)) AS t

In DB2 z/OS:

no real idea. Please tell me if you know.
I can get RBAs from sysibm.syscopy (column start_rba) or from sysibm.syslgrnx (columns lgrsrba, lgrspba) though not sure what is the best or most precise approach.
I can also get it by looking at the already created extract in ggsci, e.g. info extract e_sa.

b) Get the data from source as of this SCN. This is called flashback query. E.g.
In Oracle this would be:

SELECT * FROM employees AS OF SCN

c) Load the data to target.
You can get it via db link, data pump, custom application or scripts, …
Or even with GoldenGate extracts and replicats itself.
See https://docs.oracle.com/en/middleware/goldengate/core/21.3/admin/instantiating-oracle-goldengate-initial-load.html
Example with DB link with the help of Oracle database gateway;

INSERT /*+ APPEND */ INTO employees
SELECT * FROM [email protected]
;
COMMIT;

d) Start the change data delivery which basically contains all changes from even before the load, but starts to apply changes only as of the magic CSN/SCN pointer.

START REPLICAT r_sa, AFTERCSN <the csn>

You can even introduce different instantiation CSNs for individual tables by introducing filters in MAP statement of replicat. e.g.

MAP DB2INST1.EMPLOYEE, TARGET FROMDB2.EMPLOYEE, FILTER(@GETENV('TRANSACTION','CSN') > 1144787);
MAP DB2INST1.DEPARTMENT, TARGET FROMDB2.DEPARTMENT, FILTER(@GETENV('TRANSACTION','CSN') > 1144799);
MAP DB2INST1.LOADTEST, TARGET FROMDB2.LOADTEST, FILTER(@GETENV('TRANSACTION','CSN') > 1145124);

or by usage of DBOPTIONS ENABLE_INSTANTIATION_FILTERING if target is Oracle.
in combination with dbms_apply_adm.set_XXX_instantiation_scn
See also “Oracle GoldenGate Best Practices: Instantiation from an Oracle Source Database (Doc ID 1276058.1)”.

In most cases where Oracle is source and target you would simply use Data Pump with a flashback scn to get this all running.

The HANDLECOLLISIONS Parameter

For cases where we cannot get the data from source consistently Oracle GoldenGate provides a parameter called HANDLECOLLISIONS.
See https://docs.oracle.com/en/middleware/goldengate/core/21.1/reference/handlecollisions-nohandlecollisions.html
The docs make clear that the purpose of this parameter is to resolve the problem of the always changing data. All collisions which might happen due to the changes are automatically resolved. The logic around this is described in the docs and seems quite sophisticated. I have to admit, that even after reading it about 5 times, I still had some doubts whether we will get complete consistency after an initial load with that handlecollisions approach.

So now we get to the main purpose of this blog post. I want to test that approach even under stress. I will run an initial load, one of the tables called loadtest is always changing. There are a lot of inserts, updates, deletes. The table contents is always in motion. A SELECT of the table of approximately 300.000 rows takes long enough, that in-between a bunch of changes already happened again.

The structure of the table in DB2:

CREATE TABLE loadtest (
  id INTEGER NOT NULL,
  c  VARCHAR(50),
  da TIMESTAMP);
ALTER TABLE loadtest ADD CONSTRAINT pk_loadtest PRIMARY KEY (id);

The structure of the table in Oracle:

CREATE TABLE loadtest (
  id /* was INTEGER */  NUMBER NOT NULL,
  c  /* was VARCHAR */  VARCHAR2(50),
  da /* was TIMESTMP */ TIMESTAMP
);
ALTER TABLE loadtest ADD CONSTRAINT pk_loadtest PRIMARY KEY (id);

The stress is doing inserts which add records with new higher ids. The updates change da and c column, but never the PK = id column. The deletes remove records preferably in the range of lower numbers, but randomly.

First, the initial load, naively without any preparation:

query CSN:
on DB2 side I have a view to show me current LSN which maps to CSN:

CREATE VIEW lsn AS
SELECT applid_holding_oldest_xact, 
       oldest_tx_lsn, 
       current_lsn 
FROM table(MON_GET_TRANSACTION_LOG(-2)) AS t

This works at least with DB2 on LUW. As I said not sure about DB2 on z/OS here.

SELECT current_lsn FROM [email protected]

Load data to empty table via DB link and Oracle Database Gateway for ODBC.

INSERT /*+ APPEND */ INTO loadtest
SELECT * FROM [email protected]
;
COMMIT;

then:

START REPLICAT r_sa, AFTERCSN <the_lsn_or_rba_from_DB2>

from GoldenGate replicat’s log aka report file:

2022-05-06 15:12:15  WARNING OGG-02544  Unhandled error (ORA-26787: The row with key ("ID") = (2863) does not exist in table FROMDB2.LOADTEST
ORA-01403: no data found) while processing the record at SEQNO 8, RBA 170282 in Integrated mode. REPLICAT will retry in Direct mode.

2022-05-06 15:12:15  WARNING OGG-01004  Aborted grouped transaction on ENSIFERUM.FROMDB2.LOADTEST, Database error 1403 (No data found).

2022-05-06 15:12:15  WARNING OGG-01003  Repositioning to rba 170282 in seqno 8.

2022-05-06 15:12:15  WARNING OGG-01154  SQL error 1403 mapping DB2INST1.LOADTEST to ENSIFERUM.FROMDB2.LOADTEST No data found.
2022-05-06 15:12:15  ERROR   OGG-01296  Error mapping from DB2INST1.LOADTEST to ENSIFERUM.FROMDB2.LOADTEST.

In the discard file I find:

Current time: 2022-05-06 15:12:15
Discarded record from action ABEND on error 1403

No data found
Aborting transaction on /u00/app/oracle/product/gg19/../gg19db2/dirdat/sa beginning at seqno 8 rba 170,282
                         error at seqno 8 rba 170282
Problem replicating DB2INST1.LOADTEST to ENSIFERUM.FROMDB2.LOADTEST.
Record not found
Mapping problem with delete record (target format) SCN:131825...
*
ID = 2863
000000: 32 38 36 33                                     |2863            |

C = two thousand eight hundred sixty-three
000000: 74 77 6f 20 74 68 6f 75 73 61 6e 64 20 65 69 67 |two thousand eig|
000010: 68 74 20 68 75 6e 64 72 65 64 20 73 69 78 74 79 |ht hundred sixty|
000020: 2d 74 68 72 65 65                               |-three          |

DA = 2022-04-26 15:33:12.696348000
000000: 32 30 32 32 2d 30 34 2d 32 36 20 31 35 3a 33 33 |2022-04-26 15:33|
000010: 3a 31 32 2e 36 39 36 33 34 38 30 30 30          |:12.696348000   |

*

Process Abending : 2022-05-06 15:12:20.174628

So basically this is a collision. The init load query started. A record was deleted. This change is visible in the change data, in the trail. The record itself was not in the init load query as it turns out that the query came to that row a little later where it was already deleted.

Solution: set HANDLECOLLIONS.

I add the parameter HANDLECOLLISIONS in replicats parameter file, right before all MAP statements. Then start the abended replicat.
Here is a list of warnings I got in the log/report now:

OGG-02544  Unhandled error (ORA-26787: The row with key ("ID") = (2863) does not exist in table FROMDB2.LOADTEST
OGG-02544  Unhandled error (ORA-26787: The row with key ("ID") = (1704) does not exist in table FROMDB2.LOADTEST
OGG-02544  Unhandled error (ORA-26787: The row with key ("ID") = (589) does not exist in table FROMDB2.LOADTEST
OGG-02544  Unhandled error (ORA-26799: unique constraint FROMDB2.PK_LOADTEST violated for table FROMDB2.LOADTEST with column values ("ID") = (300287)
OGG-02544  Unhandled error (ORA-26787: The row with key ("ID") = (1660) does not exist in table FROMDB2.LOADTEST
...
OGG-02544  Unhandled error (ORA-26799: unique constraint FROMDB2.PK_LOADTEST violated for table FROMDB2.LOADTEST with column values ("ID") = (300335)

In total I had 151 such errors, whereas 49 where uniqueness issues and 102 issues with “no data found”.

But the replicat survived and after a while everything ran smooth without any further warnings in report file. And then I wanted to be sure whether the data is consistent. I stopped the load on table loadtest. I stopped it mainly to be able to have a reliable comparison of data. In a real-world you again might have an issue here. How to compare always changing data? This you can manage by listing potential differences first and then drill into those deeper. Or by just comparing such potential differences from one comparison run to the next.

A comparison for my case looks like:

SELECT COUNT(*) FROM(
(SELECT * FROM [email protected]
MINUS
SELECT * FROM loadtest
)
UNION ALL
(SELECT * FROM loadtest
MINUS
SELECT * FROM [email protected]
))

This query will return zero if the contents are the same, which was the case for me. DB2 and Oracle table contents are the same. So I am happy as HANDLECOLLISIONS seems to be fine. Still I will remove it from replicat parameter file whenever initial load
is finished, to be able to notice any other divergences later.

Summary

Of course this is not a formal proof that HANDLECOLLSIONS will always handle the challenge of in-flight data correctly, but at least I could not prove the opposite, that it fails. I repeated the tests several times and it never failed to give me data which is absolutely convergent.