What’s the bug status code meaning in MOS

MOS中查看Bug database,会看到每个bug都有自己的status,那么这其中常见的status有哪些,又都分别是什么含义呢?

我们可以通过MOS中的Advanced Search功能查找特定状态的Bug,比如有哪些是已经确认为Bug移交到研发部门但是还没有补丁写出来的?

比较常见的status有以下这些。而这些状态也基本上表明了一个bug从接收到解决的流程。

10 – Description Phase
Development is requesting more information. 研发部门需要更多信息。

16 – Support bug screening
Bug is being reviewed by our Bug Diagnostics group. Bug诊断小组正在评估。

11 – Code Bug (Response/Resolution)
Bug is being worked by Development. 已经确认为Bug,研发部门正在尝试修正。

30 – Additional Information Requested
Bug is being worked by Support and/or more information was requested by Development. 技术支持已经参与工作,不过研发部门正在要求更多信息。这意味着补丁已经写完,正在让技术测试去测试是否有效。研发部门从现有的bug描述和上传的log截图等信息中还无法确定问题,要求bug的提交者提供其他更加详细的信息。

37 – To Filer for Review/Merge Required
Bug has been fixed but the patch will be merged into the next patchset. Bug已经修正但是补丁在下一个补丁集中一起发布。

80 – Development to Q/A
Bug is being regression tested for future release. Bug被移交到质量控制部门做回归测试。

81 – Q/A to Dev/Patch or Workaround Avble
Patch released via Metalink. 补丁发布到Metalink上。

90 – Closed, Verified by Filer
Bug has been fixed and is closed. Bug已经修正并且关闭。

91 – Closed, Could Not Reproduce
Bug is closed as not reproducible. Bug被关闭因为无法重现。

92 – Closed, Not a Bug
Bug is closed as not a bug (not reproducible or setup issue). Bug被关闭因为这不是个bug,可能是因为无法重现也可能仅仅是因为客户的安装问题。

93 – Closed, Not Verified by Filer
Bug has been fixed and is closed. Bug已经修正并且关闭。

95 – Closed, Vendor OS Problem
Bug is closed as an OS problem. Bug被关闭因为这是操作系统问题。

96 – Closed, Duplicate Bug
Bug is closed as a duplicate bug. Bug被关闭因为已经有重复的bug已经被报告了。

其中10, 16表示技术支持部门(也就是Oracle的OSS)提交了一个bug,但是研发部门还没有确认和接受它是一个真正的bug。
11表示bug已经移交到研发部门,研发人员正在尝试修正,还在工作过程中,目前为止还未修正。
80, 81, 90 , 93表示bug已被修正,补丁可以下载或者请求下载了。

Install 11.2.0.2 RAC on OEL5.5 x86-64 (root.sh issue on second node)

在安装11.2.0.2 RAC的时候,第一步安装Grid,在第二个节点上运行root.sh的时候,报错如下:

Start of resource "ora.ctssd" failed
CRS-2672: Attempting to start 'ora.ctssd' on 'xsh-server2'
CRS-2674: Start of 'ora.ctssd' on 'xsh-server2' failed
CRS-4000: Command Start failed, or completed with errors.
Cluster Time Synchronisation Service  start in exclusive mode failed at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 6455.
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed

从报错信息上看是ctssd进程启动失败(在这之前会显示cssd进程启动成功,这与MOS上的其它一些第二节点运行root.sh失败的情形是不一样的,那些场景在cssd进程启动的时候就失败了),查看ctssd进程的启动log(位于$GRID_HOME/log/ctssd目录下),发现如下错误信息。

2010-11-12 18:55:46.132: [    GIPC][2424495392] gipcCheckInitialization: possible incompatible non-threaded init from [prom.c : 687], original from [clsss.c : 5325]
[ default][2424495392]Failure 4 in trying to open SV key SYSTEM.version.localhost

[ default][2424495392]procr_open_key error 4 errorbuf : PROCL-4: The local registry key to be operated on does not exist.

2010-11-12 18:55:46.135: [    CTSS][2424495392]clsctss_r_av2: Error [3] retrieving Active Version from OLR. Returns [19].
2010-11-12 18:55:46.138: [    CTSS][2424495392](:ctss_init16:): Error [19] retrieving active version. Returns [19].
2010-11-12 18:55:46.138: [    CTSS][2424495392]ctss_main: CTSS init failed [19]
2010-11-12 18:55:46.138: [    CTSS][2424495392]ctss_main: CTSS daemon aborting [19].
2010-11-12 18:55:46.138: [    CTSS][2424495392]CTSS daemon aborting

从crsctl命令中也可以看出ora.cssd启动成功,但是ora.ctssd是OFFLINE状态。

 $ crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        OFFLINE OFFLINE                                                   
ora.cluster_interconnect.haip
      1        OFFLINE OFFLINE                                                   
ora.crf
      1        OFFLINE OFFLINE                                                   
ora.crsd
      1        OFFLINE OFFLINE                                                   
ora.cssd
      1        ONLINE  ONLINE       xsh-server2                                  
ora.cssdmonitor
      1        ONLINE  ONLINE       xsh-server2                                  
ora.ctssd
      1        ONLINE  OFFLINE                                                   
ora.diskmon
      1        ONLINE  ONLINE       xsh-server2                                  
ora.drivers.acfs
      1        OFFLINE OFFLINE                                                   
ora.evmd
      1        OFFLINE OFFLINE                                                   
ora.gipcd
      1        ONLINE  ONLINE       xsh-server2                                  
ora.gpnpd
      1        ONLINE  ONLINE       xsh-server2                                  
ora.mdnsd
      1        ONLINE  ONLINE       xsh-server2   

此时如果用此命令查看第一个节点的状况会发现所有资源都是正常ONLINE的。继续检查cssd.log(位于$GRID_HOME/log/cssd目录中),显示在发现ASM磁盘的时候报错。

2010-11-12 13:44:30.505: [   SKGFD][1087203648]UFS discovery with :ORCL:VOL*:

2010-11-12 13:44:30.505: [   SKGFD][1087203648]OSS discovery with :ORCL:VOL*:

2010-11-12 13:44:30.505: [   SKGFD][1087203648]Discovery with asmlib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: str :ORCL:VOL*:

2010-11-12 13:44:30.505: [   SKGFD][1087203648]Fetching asmlib disk :ORCL:VOL1:

2010-11-12 13:44:30.505: [   SKGFD][1087203648]Fetching asmlib disk :ORCL:VOL2:

2010-11-12 13:44:30.505: [   SKGFD][1087203648]Fetching asmlib disk :ORCL:VOL3:

2010-11-12 13:44:30.505: [   SKGFD][1087203648]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted
)
2010-11-12 13:44:30.505: [   SKGFD][1087203648]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted
)
2010-11-12 13:44:30.505: [   SKGFD][1087203648]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted

值得注意的是,这样的报错在第一个节点上也同样存在,但是第一个节点上所有的资源包括ASM磁盘组却都是正常运行的。

对于以上cssd.log中的错误,按照MOS Note [1050164.1]处理,修改/etc/sysconfig/oracleasm-_dev_oracleasm文件,指定ASMLib在发现磁盘的时候需要忽略的盘和需要检查的盘。在我们的环境中是使用了Multipath来对多块磁盘做多路径处理,因此需要包括dm开头的磁盘,而忽略sd开头的磁盘。这样的问题也应该只会发生在使用了Multipath的磁盘上。

# ORACLEASM_SCANORDER: Matching patterns to order disk scanning
ORACLEASM_SCANORDER="dm"

# ORACLEASM_SCANEXCLUDE: Matching patterns to exclude disks from scan
ORACLEASM_SCANEXCLUDE="sd"

可以通过以下方法来确认是否遭遇了此问题。

# ls -l /dev/oracleasm/disks
brw-rw---- 1 oracle dba 3, 65 May 14 12:08 CRSVOL
# cat /proc/partitions
  3 65 4974448 sda
253  1 4974448 dm-1

在上面可以看到CRSVOL这个用oracleasm创建的ASM磁盘的major和minor号分别是3,65,而这正是/dev/sda的号,并不是/dev/dm-1的号,所以表示在创建ASM磁盘组的时候并没有使用到Multipath设备。通常情况下,在节点1上是正确的,而在节点2上不正确的,因此出现了问题。

在处理完以上问题以后,必须要对grid环境做deconfig再reconfig,而不能只是在失败节点上重新运行root.sh(我在这里耗费了大量时间),重新配置grid的步骤可以参考MOS Note [942166.1] – How to Proceed from Failed 11gR2 Grid Infrastructure (CRS) Installation。之后root.sh顺利在第二节点上运行成功。

在错误解决以后,回顾之前的安装信息,可以发现虽然第一个节点显示所有资源都正常,但是和正常的root.sh运行信息相比则缺少了几行显示。

正常的信息如下:

# $GRID_HOME/root.sh
Running Oracle 11g root script...

The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]: 
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
LOCAL ADD MODE 
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
OLR initialization - successful
Adding daemon to inittab
ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-2672: Attempting to start 'ora.mdnsd' on 'xsh-server1'
CRS-2676: Start of 'ora.mdnsd' on 'xsh-server1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'xsh-server1'
CRS-2676: Start of 'ora.gpnpd' on 'xsh-server1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'xsh-server1'
CRS-2672: Attempting to start 'ora.gipcd' on 'xsh-server1'
CRS-2676: Start of 'ora.cssdmonitor' on 'xsh-server1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'xsh-server1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'xsh-server1'
CRS-2672: Attempting to start 'ora.diskmon' on 'xsh-server1'
CRS-2676: Start of 'ora.diskmon' on 'xsh-server1' succeeded
CRS-2676: Start of 'ora.cssd' on 'xsh-server1' succeeded

ASM created and started successfully.

Disk Group CRSDG created successfully.

clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Successful addition of voting disk 67463e71af084f76bf98b3ee55081e40.
Successfully replaced voting disk group with +CRSDG.
CRS-4266: Voting file(s) successfully replaced
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   67463e71af084f76bf98b3ee55081e40 (ORCL:VOL1) [CRSDG]
Located 1 voting disk(s).

CRS-2672: Attempting to start 'ora.asm' on 'xsh-server1'
CRS-2676: Start of 'ora.asm' on 'xsh-server1' succeeded
CRS-2672: Attempting to start 'ora.CRSDG.dg' on 'xsh-server1'
CRS-2676: Start of 'ora.CRSDG.dg' on 'xsh-server1' succeeded
ACFS-9200: Supported
ACFS-9200: Supported
CRS-2672: Attempting to start 'ora.registry.acfs' on 'xsh-server1'
CRS-2676: Start of 'ora.registry.acfs' on 'xsh-server1' succeeded
Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

而之前的信息则缺少了以下4行。

LOCAL ADD MODE 
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
OLR initialization - successful

Oracle显然不会承认这是bug,好吧,解决问题就好。

DB time VS. DB CPU

如何行之有效地展示系统负载在做系统调优的时候是必不可少的技巧。通常我们会使用Oracle提供的Time Model,比如我们需要作出类似于下面这样的趋势图来展示系统负载的高低。

这样的趋势图可以直接使用Oracle10g以后的OEM得到,也可以将SQL结果传入Excel中作出趋势图,这里并不是想说如何作出这样的图来,而是想说在我们选取的性能指标中,DB time是什么意思?DB CPU是什么意思?

实际上,官方文档已经给出了解释(我很希望我早就注意到):V$SESS_TIME_MODEL

其中的事件模型树状图很值得参考。

总的来说(如果有任何错误,欢迎指正):
1. 数据库消耗的总时间包括background elapsed time + DB time,基本上在一个正常的系统中DB time要远远大于background elapsed time(指数据库后台进程消耗的时间,比如PMON进程本身)。
2. DB time包含DB CPU + sql execute elapsed time + parse time elapsed + 其它的那些elapsed time,基本上一个正常的系统中,前三项占据了99%以上的DB time,而其中sql execute elapsed time又应该会在95%以上,但是值得注意的是DB CPU和sql execute elapsed time是有交集的,因此你会看到在一份AWR报告中有出现DB CPU + sql execute elapsed time超过100% DB time的情况。
3. DB time是流逝的时间量(elapsed time),以微妙(microseconds)为单位,也就是百万分之一秒。在vsys_time_model中的STAT_NAME是”DB time”。
4. DB CPU是CPU运转的时间,不包含数据库进程在等待CPU的时间,同样以微秒(microseconds)为单位。在v
sys_time_model中的STAT_NAME是”DB CPU”。
5. 我们在ASH报告中经常看到的’CPU + Wait for CPU’指的是DB time,而CPU就是DB CPU。

另外,下面的这三篇文章也同样值得阅读:
Average active sessions: the magic metric? (PDF by John Beresniewicz)
Time Matters – DB Time (from Doug’s Oracle Blog)
Time Matters – DB CPU (from Doug’s Oracle Blog)