Channel [K]

关于昨天被客户问到Oracle RAC在节点间的同步问题，今天稍微整理一下。

由于Oracle RAC多节点共用一份数据库Datafile，因此在磁盘存储方面不存在同步问题。那么实际上所谓各节点同步指的是每个节点间SGA的同步，更专业一些的术语其实就是SCN propagation的算法。

在Oracle9i和Oracle10gR1中，SCN propagation默认使用Lamport方案，受到初始化参数MAX_COMMIT_PROPAGATION_DELAY影响，默认的同步间隔是7秒，也就是在极限情况下，一个节点上的更新在7秒后另外的节点才能知道。将该参数值设置为0，则表示要求任何一个事务在commit之后就立刻通知其他节点SCN变更了，这种方式就被称为BOC（Broadcast On Commit）。

在Oracle10gR2和Oracle11g中，BOC被作为了SCN propagation的默认方案，初始化参数MAX_COMMIT_PROPAGATION_DELAY被废弃，转换成了隐含参数_IMMEDIATE_COMMIT_PROPAGATION，默认值为TRUE。

SQL> @hidden
Enter value for par: propagation
old  14: x.ksppinm like '%_&par%'
new  14: x.ksppinm like '%_propagation%'

NAME                                     VALUE                     ISDEFAULT ISMOD      ISADJ
---------------------------------------- ------------------------- --------- ---------- -----
_evt_system_event_propagation            TRUE                      TRUE      FALSE      FALSE
_immediate_commit_propagation            TRUE                      TRUE      FALSE      FALSE
max_commit_propagation_delay             0                         TRUE      FALSE      FALSE

在Oracle11g RAC中，BOC有了更进一步的改善。包括：

o The number of outstanding broadcasts increased from 3 to 8.
This improves throughput but does not affect latency.

o LGWR can now issue direct and indirect sends.
This frees up the local LMS processes and improves latency.

o Processing is not limited to LMS0. The SCN is hashed to determine which LMS process will send the message (indirect send) or process the broadcast and send the ACK back to the broadcasting node.
This improves general performance by reducing the load on the local (indirect sends) and remote LMS0 processes.

o Broadcast and acknowledgement messages are no longer blocked by DRM events.
This improves BOC latency by eliminating the up to 0.5-second delay introduced by Dynamic Remastering.

o All Cache Fusion messages can now carry the broadcast SCN.
This reduces the need for explicit broadcasts thereby reducing the number of messages on the private interconnect and possibly reducing latency.

在indirect send方式中，LGWR在本地事务提交时会根据SCN号计算出的HASH值选择一个本地LMS进程（之前始终是LMS0），然后本地LMS进程又根据这个HASH值选择一个其它节点的远程LMS进程来通知，这样就均衡负载了各节点LMS进程的工作量。

为了更进一步降低本地LMS进程的工作量，现在有了direct send方式，在这种方式中，传递BOC到其它节点LMS进程的操作将由LGWR进程自己来完成，当然，其它节点LMS进程对于BOC消息的回馈（ACK）仍然还是发送回本地LMS进程的，再由LMS进程通知LGWR进程。

从事务提交post LGWR进程开始写日志，一直到LGWR写完日志，并且收到了本地LMS进程的通知，所有的其他节点都给了BOC ACK消息，这才算是完成了Log file sync等待。因此很明显，在RAC环境中，LMS进程的效率、BOC的传递效率都会影响到Log file sync等待的多少，也意味着会影响到系统响应时间。这也是为什么Oracle一直强调在操作系统级别的进程CPU需求中，LGWR进程和LMS进程都应该置于Real Time scheduling策略中，同时要保证RAc节点Interconnect的畅通，高吞吐量，低延迟。

如果LGWR进程写本地redo文件在收到所有其他节点的BOC ACK之前完成了，那么这通常意味着CPU不够了或者私有网络性能过差。

根据Metalink Bug 5455094的解释，在Oracle9iR2 RAC环境中，如果db_cache_size + db_keep_cache_size总共超过了50G，那么数据库实例就会在启动的时候报ORA-00064错误，无法正常启动。

00064, 00000, “object is too large to allocate on this O/S (%s,%s)”
// *Cause: An initialization parameter was set to a value that required
// allocating more contiguous space than can be allocated on this
// operating system.

Metalink上的建议是：
1. 降低cache size（这是一个很扯淡的建议，直接无视）
2. 设置_ksmg_granule_size到一个比较大的值，比如_ksmg_granule_size=67108864，实测结果当设置该参数之后，数据库性能低下，反应缓慢。

客户的这个环境，数据库SGA设置高达200G，以上两种方法都不能使用，却可以通过土办法来曲线救国。

1. 设置SGA_MAX_SIZE = 200G，这个参数不会触发Bug 5455094。
2. 设置db_cache_size为一个较小的值，以不触发Bug为标准。
3. 在每次实例启动之后，脚本自动执行用alter system命令将buffer cache扩大。

alter system set db_cache_size = 214753888696 scope=memory;

Channel [K]

面朝大海，春暖花开

ODU@Laoxiong

About SCN propagation in Oracle RAC

ORA-00064 When Oracle Database Instance Startup