Chanel [K]

面朝大海,春暖花开

Archive for the ‘RAC’ tag

How to resolve ORA-01034 when RAC failover

without comments

今天在客户处测试Oracle 9.2.0.8 on HP-UX IA64的RAC Failover功能,遇到ORA-01034错误。

表现为:
当关闭RAC环境的某一个实例之后(无论是shutdown abort还是shutdown immediate),再用远程客户端通过tns连接RAC Service都会间歇性报ORA-01034错误。

$ sqlplus system/oracle@prod 
 
SQL*Plus: Release 9.2.0.8.0 - Production ON Tue Nov 17 20:52:09 2009
 
Copyright (c) 1982, 2002, Oracle Corporation.  ALL rights reserved.
 
ERROR:
ORA-01034: ORACLE NOT available
ORA-27101: shared memory realm does NOT exist
HPUX-ia64 Error: 2: No such file OR directory

客户端的TNS配置是很常规的客户端failover。

PROD  =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = VIP1)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = VIP2)(PORT = 1521))
      (LOAD_BALANCE = yes)
    )
    (CONNECT_DATA =
      (SERVICE_NAME = prod)
      (FAILOVER_MODE=
       (TYPE=SELECT)
       (METHOD=BASIC))
    )
  )

纳闷许久,仔细检查服务器端的listener.ora配置,才发现设置了GLOBAL_DBNAME,这是对于还没有往本地监听动态注册服务名功能的Oracle8和Oracle7才需要设置,在Oracle9i之后,如果设置了该参数,将会导致Failover失败。

将listener.ora中的配置从:

SID_LIST_LISTENER_PROD2 =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME=prod)
      (ORACLE_HOME = /oracle/product/9.2)
      (SID_NAME = prod2)
    )
  )

修改为:

SID_LIST_LISTENER_PROD2 =
  (SID_LIST =
    (SID_DESC =
      (ORACLE_HOME = /oracle/product/9.2)
      (SID_NAME = prod2)
    )
  )

再次测试Failover,一切正常。

结论:
1. 对于监听依然存在,然后数据库实例关闭的情况,必须是在监听中动态注册的服务,才可以实现Failover。
2. GLOBAL_DBNAME会影响Failover。

Written by kamus

November 17th, 2009 at 11:18 pm

Posted in Oracle RDBMS

Tagged with ,

How to identify the cluster name

with 5 comments

在为RAC环境配置database control的时候,会被问及cluster name,当然我们知道默认安装的Oracle Cluster Name就是crs,但是如何确认到底CRS的名字是什么呢?

[oracle@dbserver1 oracle10g]$ emca -config dbcontrol db -cluster -EM_NODE dbserver1 -EM_SID_LIST intertol2,intertol3,intertol4
 
STARTED EMCA at Jul 27, 2009 4:20:25 PM
EM Configuration Assistant, Version 10.2.0.1.0 Production
Copyright (c) 2003, 2005, Oracle.  All rights reserved.
 
Enter the following information:
Database unique name: intertol
Database Control is already configured for the database intertol
You have chosen to configure Database Control for managing the database intertol
This will remove the existing configuration and the default settings and perform a fresh configuration
Do you wish to continue? [yes(Y)/no(N)]: Y
Listener port number: 1521
Cluster name:

实际上Oracle提供了一个实用程序cemutlo来获取cluster name。

[oracle@dbserver1 ~]$ $ORA_CRS_HOME/bin/cemutlo -h
Usage: /u01/app/oracle/product/crs/bin/cemutlo.bin [-n] [-w]
        where:
        -n prints the cluster name
        -w prints the clusterware version in the following format:
                 <major_version>:<minor_version>:<vendor_info>
 
[oracle@dbserver1 ~]$ cemutlo -n
crs

另外还有使用ocrdump命令或者ocrconfig命令来间接获得cluster name的方法,详细参看《怎么查安装CRS时设置的cluster name》

Written by kamus

July 27th, 2009 at 5:06 pm

Posted in Oracle RDBMS

Tagged with ,

4 Nodes Oracle10g RAC on Linux x86-64

with 9 comments

用时两天给客户安装完了4节点的Oracle10g RAC on Linux x86-64,使用了OCFS2存储数据文件以及ocr和voting disk。概括一下碰到的问题。

1. 基本上完全按照Oracle官方安装文档,但是其中kernel.shmall内核参数的设置,如果按照默认值2097152的话,最多只能使用到8G内存,当配置SGA过高的时候,就会在启动实例的时候报错。

SQL> startup nomount 
ORA-27102: out of memory
Linux-x86_64 Error: 28: No space LEFT ON device

需要将此参数值修改为shmmax/PAGE_SIZE值(通过getconf PAGE_SIZE获取),在此次实施中,客户机器为64G内存,PAGE_SIZE = 4096,因此应该设置kernel.shmall = 16475728。

2. 在安装CRS之前文档中要求运行rootpre.sh,但是却会报“No OraCM running”这样的错误。按照Metalink Note: 405986.1,这个错误可以忽略。

3. 安装CRS最后运行root.sh的时候,报错:
PROT-1: Failed to initialize ocrconfig
Failed to upgrade Oracle Cluster Registry configuration

重新mount OCFS2文件系统之后,再次运行root.sh,故障消失。

umount /u02/oradata/system
mount -L "vol_system" /u02/oradata/system -o nointr,datavolume

4. 服务端的Service Load Balance问题依旧,还是得将remote_listener参数置空之后,才能正常连接。

Written by kamus

July 19th, 2009 at 11:03 pm

Posted in Oracle RDBMS

Tagged with ,