Traditionally high availability for Oracle cluster interconnect has been achieved with OS methods, like Linux kernel bonding module.
The idea is simple: we aggregate at least two network cards into one bonded logical interface using different available modes like round-robin, active-backup, balance-xor, etc.
With the first 11g Release 2 patchset (188.8.131.52) Oracle introduced a new operating system independent feature called Redundant Interconnect. Actually a very good idea.It enables load-balancing and high availability across multiple (up to four) NICs without any other OS technology (not available on Windows – no UDP support). You can enable it during installation selecting up to 4 NIC as Interface Type Private, or afterwards with oifcfg tool.
For Redundant Interconnect Oracle is using internally 2 features:
As you see in the example on every NIC used for the cluster interconnect the Clusterware starts and uses (at least) one highly available IP (HAIP) address. Note:
Now back to the question: Do I still need bonding for cluster interconnect ? The answer: as of 184.108.40.206 yes, I recommend it. Why ?
If your cluster interconnect NICs are set up in the same network segment you can experience issues with ARP cache no being refreshed after HAIP relocation. Have a look at the example below. The HAIP 169.254.147.95 has been relocated to eth1, but according to ARP cache on other node the HAIP is still available on eth3 (another MAC). Final result –> node eviction. According to Oracle Support: BUG 10389682.
The workaround is to use different network segments for the cluster interconnect NICs, e.g.:
With this configuration the issue with ARP cache is solved, but if you lose on one server eth0 and at the same time eth1 on the other one, you will have a problem.
It’s important to mention that the new 220.127.116.11 feature will be used in any configuration, also if you use bonding. In this case the cluster will take online one HAIP on the bonded device. Management of the NICs, ARP requests will be done, as used to, by the OS.
good to see your blog online. Nice entry.
Keep on blogging!
Thanks for this post. But I had a query:
If we do not use bonding and instead use different network segments then when eth0 fails, the HAIP will move to eth1 but on the other side it will be still on eth0. In that case how will it communicate? Because the physical paths are different:
So as we see, there is no physical path between eth0 and eth1. Should we have a cross cable between switch1 and switch2? If yes then its all 1 network segment - isnt it? Then we can not have 2 different network segments -s isnt it?
Sorry that should have been Robert and not Mathias. Apologise.
in the case you mentioned:
- two network segments
- eth0 (net1) on server 1 failed
The cluster will:
on server 1:
- relocate the failed HAIP from eth0 to eth1
- update the ARP cache
on server 2:
- relocate the HAIP from eth0 to eth1 (although there are no problems with eth0)
Note: in case of NIC failure clusterware is relocating HAIPs not only from the failied NIC on one node, but rather on all cluster nodes. For the cluster communitation (heartbeats, cache fusion) the cluster (from 18.104.22.168) will use the HAIPS not IPs you configured for the NICs.
Thanks Robert. You are right. A few days back I tested and it was exactly as you mentioned i.e. it moves from eth0 to eth1 on both nodes.
But while testing I observed that, even after adding second interface as cluster interconnect, Oracle assigned HAIP only on the first interface. I came across the foll. link: www.esosys.net/HAIP-2011-03-04.htm
where Tony too initially faced the same behaviour but after understanding from you and restarting (clusterware is brought down on all nodes together, and then restarted), it worked for him. But didn't for me :( restarting the cluster or even restarting the nodes did not help.
Any idea what could it be?
Have a look at the orarootagent log file (orarootagent_root.log). It's quite useful to analyze problems with HAIPs.
I actually had a look at this very file and couldn't find anything (no errors; infact there was no mentioning of the second interface anywhere). But I will have a second deep look at it.
I have issue on solaris when private has 2 ethernet and connected to 2 switches, yes use different segment (192.168.130.x, 192.168.131.x).
the behavior is even when the second node bring down cluster by crsctl stop crs and try to rejoin by crsctl start crs, this node can't join cluster :(.
during error, found like this in crfmond.log file:
2011-08-05 01:54:09.806: [ CRFMOND]Thread nlist running
2011-08-05 01:54:09.806: [ CRFM]crfm_register4: publishing data not available
2011-08-05 01:54:09.807: [ CRFM]crfm_listeninit: endp listening tcp://192.168.130.105:61020
2011-08-05 01:54:10.829: [ CRFMOND]Loggerd Started
2011-08-05 01:54:10.829: [ CRFMOND]Finding loggerd, consulting with neighbor soaidmdb2, total 2...
2011-08-05 01:54:10.834: [ CRFM]crfm_connect_to: send fail(gipcret: 13)
2011-08-05 01:54:10.834: [ CRFM]crfmctx dump follows
2011-08-05 01:54:10.834: [ CRFM]****************************
2011-08-05 01:54:10.834: [ CRFM]crfm_dumpctx: connection local name: tcp://0.0.0.0:27764
2011-08-05 01:54:10.834: [ CRFM]crfm_dumpctx: connection peer name: tcp://192.168.130.106:61020
2011-08-05 01:54:10.834: [ CRFM]crfm_dumpctx: connaddr: tcp://192.168.130.106:61020
is it related to arp cache too? or ?
This is a very useful article.
My question. According to your observation, in case of NIC failure clusterware relocates HAIPs from the failied all cluster nodes. Does it mean that all the similar named interfaces on different names would have the same value of HAIP assigned to them?
E.g. eth1 on all nodes could have HAIP value of 169.254.210.1.
Similarly eth2 on all nodes could have HAIP value of 169.254.118.222.
Is my understanding correct?
I noticed on 22.214.171.124 that when one nic fails, the 169 address moves to the other nic as expected, however, also on the other node. which was not impacted.. case open at MOS to see it this is normal.
I think I will stick to IPMP .... not sure this is production ready yet
The same behaviour can be observed with 126.96.36.199. According to Oracle it is expected.
> E.g. eth1 on all nodes could have HAIP value of 169.254.210.1.
> Similarly eth2 on all nodes could have HAIP value of 169.254.118.222.
> Is my understanding correct?
No, it's not possible. Every HAIP must be unique in a cluster. Example from a two node cluster:
oracle@black:~/ [+ASM2] oifcfg getif -type cluster_interconnect
eth1 10.10.0.0 global cluster_interconnect
eth2 10.20.0.0 global cluster_interconnect
oracle@black:~/ [+ASM2] ifconfig | grep -E 'eth1:|eth2:' -A 1
eth1:1 Link encap:Ethernet HWaddr 52:54:00:6F:6C:74
inet addr:169.254.97.202 Bcast:169.254.127.255 Mask:255.255.128.0
eth2:1 Link encap:Ethernet HWaddr 52:54:00:57:6C:5E
inet addr:169.254.186.35 Bcast:169.254.255.255 Mask:255.255.128.0
oracle@white:~/ [+ASM1] ifconfig | grep -E 'eth1:|eth2:' -A 1
eth1:1 Link encap:Ethernet HWaddr 52:54:00:7A:A6:A0
inet addr:169.254.30.171 Bcast:169.254.127.255 Mask:255.255.128.0
eth2:1 Link encap:Ethernet HWaddr 52:54:00:51:66:AD
inet addr:169.254.240.248 Bcast:169.254.255.255 Mask:255.255.128.0
Do I still need bonding for the cluster interconnect ? - Robert Bialek - Blogs - triBLOG