Do I still need bonding for the cluster interconnect ?

Do I still need bonding for the cluster interconnect ?

Rate This
  • Comments 13

Traditionally high availability for Oracle cluster interconnect has been achieved with OS methods, like Linux kernel bonding module.

 

image

The idea is simple: we aggregate at least two network cards into one bonded logical interface using different available modes like round-robin, active-backup, balance-xor, etc.

 

image

With the first 11g Release 2 patchset (11.2.0.2) Oracle introduced a new operating system independent feature called Redundant Interconnect. Actually a very good idea.It enables load-balancing and high availability across multiple (up to four) NICs without any other OS technology (not available on Windows – no UDP support). You can enable it during installation selecting up to 4 NIC as Interface Type Private, or afterwards with oifcfg tool.

  

image

 

For Redundant Interconnect Oracle is using internally 2 features:

  • Multicast network communication
    • On CRS bootstrap multicasting will be used to find cluster peer nodes
    • After that, the network communication will be switched to unicast
    • Oracle utilizes multicast network 230.0.1.0. With patch #9974223 on top of 11.2.0.2 multicast works also on the 224.0.0.251 network (currently available only for Linux x86-64)
  • Local Link Address
    • LLA addresses are used as HAIP on every cluster interconnect NIC

image

 

As you see in the example on every NIC used for the cluster interconnect the Clusterware starts and uses (at least) one highly available IP (HAIP) address. Note:

  • From 3 NICs in the configuration there will be 4 HAIP
  • Every NIC outage will lead to HAIP relocation (not only on the concerned node!)

 

Now back to the question: Do I still need bonding for cluster interconnect ? The answer: as of 11.2.0.2 yes, I recommend it. Why ?

If your cluster interconnect NICs are set up in the same network segment you can experience issues with ARP cache no being refreshed after HAIP relocation. Have a look at the example below. The HAIP 169.254.147.95 has been relocated to eth1, but according to ARP cache on other node the HAIP is still available on eth3 (another MAC). Final result –> node eviction. According to Oracle Support: BUG 10389682.

 

image

The workaround is to use different network segments for the cluster interconnect NICs, e.g.:

  • eth0 on all cluster nodes ==> network subnet 1
  • eth1 on all cluster nodes ==> network subnet 2

With this configuration the issue with ARP cache is solved, but if you lose on one server eth0 and at the same time eth1 on the other one, you will have a problem.

image

 

It’s important to mention that the new 11.2.0.2 feature will be used in any configuration, also if you use bonding. In this case the cluster will  take online one HAIP on the bonded device. Management of the NICs, ARP requests will be done, as used to, by the OS.

 

image

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Post
  • Hi Robert,

    good to see your blog online. Nice entry.

    Keep on blogging!

    Mathias

  • Hi Mathias,

    Thanks for this post. But I had a query:

    If we do not use bonding and instead use different network segments then when eth0 fails, the HAIP will move to eth1 but on the other side it will be still on eth0. In that case how will it communicate? Because the physical paths are different:

    eth0<==>switch1<==>eth0

    eth1<==>switch2<==>eth1

    So as we see, there is no physical path between eth0 and eth1. Should we have a cross cable between switch1 and switch2? If yes then its all 1 network segment - isnt it? Then we can not have 2 different network segments -s isnt it?

    Thanks.

  • Sorry that should have been Robert and not Mathias. Apologise.

  • Hi,

    in the case you mentioned:

    - two network segments

    - eth0 (net1) on server 1 failed

    The cluster will:

    on server 1:

    - relocate the failed HAIP from eth0 to eth1

    - update the ARP cache

    on server 2:

    - relocate the HAIP from eth0 to eth1 (although there are no problems with eth0)

    - update the ARP cache

    Note: in case of NIC failure clusterware is relocating HAIPs not only from the failied NIC on one node, but rather on all cluster nodes. For the cluster  communitation (heartbeats, cache fusion) the cluster (from 11.2.0.2) will use the HAIPS not IPs you configured for the NICs.

    HTH,

    Robert      

  • Thanks Robert. You are right. A few days back I tested and it was exactly as you mentioned i.e. it moves from eth0 to eth1 on both nodes.

    But while testing I observed that, even after adding second interface as cluster interconnect, Oracle assigned HAIP only on the first interface. I came across the foll. link: www.esosys.net/HAIP-2011-03-04.htm

    where Tony too initially faced the same behaviour but after understanding from you and restarting (clusterware is brought down on all nodes together, and then restarted), it worked for him. But didn't for me :(  restarting the cluster or even restarting the nodes did not help.

    Any idea what could it be?

    Thanks.

  • Hi,

    Have a look at the orarootagent log file (orarootagent_root.log). It's quite useful to analyze problems with HAIPs.

  • Thanks Robert.

    I actually had a look at this very file and couldn't find anything (no errors; infact there was no mentioning of the second interface anywhere). But I will have a second deep look at it.

    Thanks.

  • I have issue on solaris when private has 2 ethernet and connected to 2 switches, yes use different segment (192.168.130.x, 192.168.131.x).

    the behavior is even when the second node bring down cluster by crsctl stop crs and try to rejoin by crsctl start crs, this node can't join cluster :(.

    during error, found like this in crfmond.log file:

    2011-08-05 01:54:09.806: [ CRFMOND][9]Thread nlist running

    2011-08-05 01:54:09.806: [    CRFM][9]crfm_register4: publishing data not available

    2011-08-05 01:54:09.807: [    CRFM][1]crfm_listeninit: endp listening  tcp://192.168.130.105:61020

    2011-08-05 01:54:10.829: [ CRFMOND][1]Loggerd Started

    2011-08-05 01:54:10.829: [ CRFMOND][13]Finding loggerd, consulting with neighbor soaidmdb2, total 2...

    2011-08-05 01:54:10.834: [    CRFM][13]crfm_connect_to: send fail(gipcret: 13)

    2011-08-05 01:54:10.834: [    CRFM][13]crfmctx dump follows

    2011-08-05 01:54:10.834: [    CRFM][13]****************************

    2011-08-05 01:54:10.834: [    CRFM][13]crfm_dumpctx: connection local name: tcp://0.0.0.0:27764

    2011-08-05 01:54:10.834: [    CRFM][13]crfm_dumpctx: connection peer name:  tcp://192.168.130.106:61020

    2011-08-05 01:54:10.834: [    CRFM][13]crfm_dumpctx: connaddr:  tcp://192.168.130.106:61020

    is it related to arp cache too? or ?

    thanks.

  • Hi Robert,

    This is a very useful article.

    My question. According to your observation, in case of NIC failure clusterware relocates HAIPs from the failied all cluster nodes. Does it mean that all the similar named interfaces on different names would have the same value of HAIP assigned to them?

    E.g. eth1 on all nodes could have HAIP value of 169.254.210.1.

        Similarly eth2 on all nodes could have HAIP value of 169.254.118.222.

    Is my understanding correct?

    Thanks much.

  • I noticed on 11.2.0.3 that when one nic fails, the 169 address moves to the other nic as expected, however, also on the other node. which was not impacted.. case open at MOS to see it this is normal.

    I think I will stick to IPMP .... not sure this is production ready yet

    Philippe

  • The same behaviour can be observed with 11.2.0.2. According to Oracle it is expected.

  • Hi NP,

    > E.g. eth1 on all nodes could have HAIP value of 169.254.210.1.

    >      Similarly eth2 on all nodes could have HAIP value of 169.254.118.222.

    > Is my understanding correct?

    No, it's not possible. Every HAIP must be unique in a cluster. Example from a two node cluster:

    oracle@black:~/ [+ASM2] oifcfg getif -type cluster_interconnect

    eth1  10.10.0.0  global  cluster_interconnect

    eth2  10.20.0.0  global  cluster_interconnect

    oracle@black:~/ [+ASM2] ifconfig | grep -E 'eth1:|eth2:' -A 1

    eth1:1    Link encap:Ethernet  HWaddr 52:54:00:6F:6C:74  

             inet addr:169.254.97.202  Bcast:169.254.127.255  Mask:255.255.128.0

    --

    eth2:1    Link encap:Ethernet  HWaddr 52:54:00:57:6C:5E  

             inet addr:169.254.186.35  Bcast:169.254.255.255  Mask:255.255.128.0

    oracle@white:~/ [+ASM1] ifconfig | grep -E 'eth1:|eth2:' -A 1

    eth1:1    Link encap:Ethernet  HWaddr 52:54:00:7A:A6:A0  

             inet addr:169.254.30.171  Bcast:169.254.127.255  Mask:255.255.128.0

    --

    eth2:1    Link encap:Ethernet  HWaddr 52:54:00:51:66:AD  

             inet addr:169.254.240.248  Bcast:169.254.255.255  Mask:255.255.128.0

  • Do I still need bonding for the cluster interconnect ? - Robert Bialek - Blogs - triBLOG

Page 1 of 1 (13 items)
Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Post