(1) AWR report shows bogus wait events and times on SPARC T5 servers
Here is a sample from one of the Oracle 11g R2 databases running on a SPARC T5 server with Solaris 11.1 SRU 7.5
Top 5 Timed Foreground Events
Event | Waits | Time(s) | Avg wait (ms) | % DB time | Wait Class |
---|---|---|---|---|---|
latch: cache buffers chains | 278,727 | 812,447,335 | 2914850 | 13307324.15 | Concurrency |
library cache: mutex X | 212,595 | 449,966,330 | 2116542 | 7370136.56 | Concurrency |
buffer busy waits | 219,844 | 349,975,251 | 1591925 | 5732352.01 | Concurrency |
latch: In memory undo latch | 25,468 | 37,496,800 | 1472310 | 614171.59 | Concurrency |
latch free | 2,602 | 24,998,583 | 9607449 | 409459.46 | Other |
Reason:
Unknown. There is a pending bug 17214885 - Implausible top foreground wait times reported in AWR report.
Tentative workaround:
Disable power management as shown below.
# poweradm set administrative-authority=none
# svcadm disable power
# svcadm enable power
Verify the setting by running poweradm list
.
Also disable NUMA I/O object binding by setting the following parameter in /etc/system (requires a system reboot).
set numaio_bind_objects=0
Oracle Solaris 11 added support for NUMA I/O architecture. Here is a brief explanation of NUMA I/O from Solaris 11 : What's New web page.
Non-Uniform Memory Access (NUMA) I/O : Many modern systems are based on a NUMA architecture, where each CPU or set of CPUs is associated with its own physical memory and I/O devices. For best I/O performance, the processing associated with a device should be performed close to that device, and the memory used by that device for DMA (Direct Memory Access) and PIO (Programmed I/O) should be allocated close to that device as well. Oracle Solaris 11 adds support for this architecture by placing operating system resources (kernel threads, interrupts, and memory) on physical resources according to criteria such as the physical topology of the machine, specific high-level affinity requirements of I/O frameworks, actual load on the machine, and currently defined resource control and power management policies.
Do not forget to rollback these changes after applying the fix for the database bug 17214885, when available.
(2) Redo logs on F40 PCIe cards (non-volatile flash storage)
Per the F40 PCIe card user's guide, the Sun Flash Accelerator F40 PCIe Card is designed to provide best performance for data transfers that are multiples of 8k size, and using addresses that are 8k aligned. To achieve optimal performance, the size of the read/write data should be an integer multiple of this block size and the data transferred should be block aligned. I/O operations that are not block aligned and that do not use sizes that are a multiple of the block size may suffer performance degration, especially for write operations.
Oracle redo log files default to a block size that is equal to the physical sector size of the disk, typically 512 bytes. And most of the time, database writes to the redo log in a normal functioning environment. Oracle database supports a maximum block size of 4K for redo logs. Hence to achieve optimal performance for redo write operations on F40 PCIe cards, tune the environment as shown below.
- Configure the following init parameters
_disk_sector_size_override=TRUE
_simulate_disk_sectorsize=4096 - Create redo log files with 4K block size
eg.,
SQL> ALTER DATABASE ADD LOGFILE '/REDO/redo.dbf' size 20G blocksize 4096; - [Solaris only] Append the following line to /kernel/drv/sd.conf (requires a reboot)
sd-config-list="ATA 3E128-TS2-550B01","disksort:false, cache-nonvolatile:true, physical-block-size:4096"; - [Solaris only][F20] To enable maximum throughput from the MPT driver, append the following line to /kernel/drv/mpt.conf and reboot the system.
mpt_doneq_thread_n_prop=8;
This tip is applicable to all kinds of flash storage that Oracle sells or sold including F20/F40 PCIe cards and F5100 storage array. sd-config-list in sd.conf may need some adjustment to reflect the correct vendor id and product id.