How to change VIP interface in 10g cluster

凌晨2点出发到客户处加班,加班的目的是由于改动网卡而重新配置VIP资源。

IBM AIX5L的系统,安装的是10gR2 RAC,在最开始安装的时候,客户配置了HACMP,并且设置了Primary网卡和Standby网卡,同时HACMP还会管理这两块网卡,当Public网卡出现问题的时候IP会切换到Standby网卡,但是10g Cluster的VIP却无法应对这种情况,当发生IP切换,VIP就down了。本来客户如此考虑是为了避免网卡的单点故障,但是通过HACMP这样管理的方法却仍然无法避免VIP的单点故障,因此客户决定今天晚上重新设置网卡,将原本的Primary和Standby网卡bunddle成一块Public网卡,这样网卡的Interface Name就会发生改变,所以VIP资源就需要重新配置。

修改VIP资源的步骤大体如下。

1. 停止数据库,CRS

$ srvctl stop database -d grid 
$ srvctl stop nodeapps -n node1
$ srvctl stop nodeapps -n node2

2. 修改OCR中的信息
删除原先的信息

$ORA_CRS_HOME/bin/oifcfg delif -global eth1

添加新的信息

$ORA_CRS_HOME/bin/oifcfg setif –global eth0/192.168.2.0:public

检查是否添加成功

$ORA_CRS_HOME/bin/oifcfg getif

3. 用root用户修改nodeapps
因为修改必须在 Oracle Clusterware stack启动状态下进行,因此上面一步要用srvctl stop nodeapps来停止资源而不要使用crsctl stop crs来停掉整个Clusterware。

# srvctl modify nodeapps -n node1 -A 192.168.2.125/255.255.255.0/eth0
# srvctl modify nodeapps -n node2 -A 192.168.2.126/255.255.255.0/eth0

检查是否修改成功

# srvctl config nodeapps -n  -a

4. 重新启动nodeapps和数据库

$ srvctl start nodeapps -n node1
$ srvctl start nodeapps -n node2
$ srvctl start database -d grid 

Oracle10gR2 RAC OCR & Voting Disk backup

在Oracle10gR2的RAC环境中,数据库自然是使用RMAN来备份,那么CRS和ASM实例如何备份呢?

Oracle会自动对CRS的配置信息OCR盘进行备份,Oracle会自动选择将备份文件存储在哪个节点上,通过ocrconfig命令我们可以知道最近的ocr备份信息的存储情况,然后定期使用操作系统的tar或者带库的文件系统备份功能将相应目录备份进磁带,就完成了ocr的备份。

/oracle/crs/cdata>ocrconfig -showbackup

server1 2007/04/17 12:23:56 /oracle/crs/cdata/crs
server1 2007/04/17 08:23:55 /oracle/crs/cdata/crs
server1 2007/04/17 04:23:54 /oracle/crs/cdata/crs
server1 2007/04/16 08:23:50 /oracle/crs/cdata/crs
server2 2007/04/04 02:14:51 /oracle/crs/cdata/crs

对于仲裁盘votingdisk,可以使用dd命令将其copy到文件系统,然后同样使用带库的文件系统备份功能备份到磁带上。crsctl query命令可以得到当前使用的votingdisk的设备名称。 /oracle/crs/cdata>crsctl query css votedisk
0. 0 /dev/vote_disk

/oracle/crs/cdata>dd if=/dev/vote_disk of=/orabackup/vote_disk
501760+0 records in.
501760+0 records out.

最后是ASM实例的备份,因为ASM没有任何数据文件,所以只需要在文件系统级别备份ASM的ORACLE_HOME目录即可。

Powered by ScribeFire.

Oracle 10.2.0.3 RAC Reboot due to system time change

在Oracle10.2.0.3 RAC的测试中,发现如果修改某个节点的系统时间超过1.5秒,那么这个节点会被自动重新启动。

好狠的处理方式 ……

详细机制参见Internal Only的Metalink Note 308051.1

The OPROCD executable sets a signal handler for the SIGALRM handler and sets the interval timer based on the to-millisec parameter provided. The alarm handler gets the current time and checks it against the time that the alarm handler was last entered. If the difference exceeds (to-millisec + margin-millisec), it will fail; the production version will cause a node reboot.

尝试修改/etc/init.cssd中关于OPROCD的配置,将DISABLE_OPROCD设置为TRUE,然后重新启动系统,在系统进程中已经不存在oprocd进程,但是居然修改完系统时间以后,机器仍然被重新启动了。

文档中另外的描述提到,如果OPROCD是在non fatal mode状态下启动的,那么将只会写一段log而不去重新启动机器,并且在Note:265769.1中也描述了如何修改为non fatal mode,但是我没有去尝试。

In fatal mode, OPROCD will reboot the node if it detects excessive wait. In Non Fatal mode, it will write an error message out to the file .oprocd.log in one of the following directories.

最后尝试的结果是将整个cssd进程disable掉,这样可以避免因为修改系统时间而引起机器重启。

Oracle10g的CRS确实有些霸道,上次的测试中拔掉Private IP网卡上的网线,操作系统会重新启动,这次居然修改系统时间也会导致系统重启,真当这些机器是Windows了?UNIX Server中重启一次机器多大的事儿啊,CRS搞的跟吃饭一样随意,时不常reboot。

下面的这段资料描述了Oracle CRS的三个进程会在哪些状态下重新启动机器。

Oracle clusterware has the following three daemons which may be responsible for panicing the node. It is possible that some other external entity may have rebooted the node. In the context of this discussion, we will assume that the reboot/panic was done by an Oracle clusterware daemon.

* Oprocd – Cluster fencing module
* Cssd – Cluster sychronization module which manages node membership
* Oclsomon – Cssd monitor which will monitor for cssd hangs

OPROCD This is a daemon that only gets activated when there is no vendor clusterware present on the OS. This daemon is also not activated to run on Windows/Linux. This daemon runs a tight loop and if it is not scheduled for 1.5 seconds, will reboot the node.
CSSD This daemon pings the other members of the cluster over the private network and Voting disk. If this does not get a response for Misscount seconds and Disktimeout seconds respectively, it will reboot the node.
Oclsomon This daemon monitors the CSSD to ensure that CSSD is scheduled by the OS, if it detects any problems it will reboot the node.

更多讨论参见itpub note 747833

关于为何要reboot,Wing Hong在上面的帖子里对fencing有一段深入浅出的解释,摘录如下。

fencing is a very important concept in cluster. not sure I can explain clearly in a few words. any how , forget about RAC first, just look at generic share-disk cluster issue.

let’s say the cluster only has two nodes. both share the disk. ie. they both have the right to write to the same disk.

so this share access must be coordinated.

however , if for some reason , this coordination lost, it can be any reason, communication process hanging, you unplug the network cable between them, etc etc

at this moment both nodes are functioning normally except the lost coordination between them.

then the cluster has two problems to solve:
1. who will be the new member of the new incarnation of cluster , in this case, this is split brain issue.

2. once we decide who will remain in the cluster and who will go , how can we prevent the going node NOT to do something harmful to the cluster ? this is fencing issue. Bear in mind , the going node is working perfectly normal except the coordination part, so it still can write to the shared-disk.

There are a couple of approaches in fencing:

1.server fencing , in cluster terms, Shoot The Other Node In The Head (STONITH) , i.e the good node kill the going node.

( the way Oracle do is : reboot itself, once go through the reboot process, it just do the rejoin cluster again, then the cluster can decide whether accept it or not )

2. I/O fencing. rather than trying to kill the node, it is working on the disks side to block the going node’s access to disk, Sun , Veritas has solution on this way.

HTH. also try to do a google on terms like “fencing” , “split brain”, “amnesia”.