I have been running an intel mobo with hardware RAID and 6 x 1TB Drives for about four years. The server stopped booting a few weeks ago and I am trying to resurect it and if possible salvage the data. It was not even showing up in POST cycle as a controller, so I moved it to another board I had around and it shows up, but still not working correctly. The original system was running CentOS 5.x and controller moved to new system running CENTOS 6.6
Issue: RAID SRCSAS18E RAID 5 Volume not showing up. Cannot do "ctrl+G" from POST stage (it pauses for about 45seconds during post). OS driver installed as well as CLI and web tool but controller does not show up to web tool, and when I issue command to re-flash (likly it would be an upgrade as I don't recall last time I did a firmware flash on it) it does not give me very useful response.
Hardware: | ||||||
---|---|---|---|---|---|---|
[root@titan1 ~]# lspci 00:00.0 Host bridge: NVIDIA Corporation C55 Host Bridge (rev a2) 00:00.1 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1) 00:00.2 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1) 00:00.3 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1) 00:00.4 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1) 00:00.5 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a2) 00:00.6 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1) 00:00.7 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1) 00:01.0 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1) 00:01.1 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1) 00:01.2 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1) 00:01.3 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1) 00:01.4 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1) 00:01.5 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1) 00:01.6 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1) 00:02.0 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1) 00:02.1 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1) 00:02.2 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1) 00:03.0 PCI bridge: NVIDIA Corporation C55 PCI Express bridge (rev a1) 00:06.0 PCI bridge: NVIDIA Corporation C55 PCI Express bridge (rev a1) 00:07.0 PCI bridge: NVIDIA Corporation C55 PCI Express bridge (rev a1) 00:09.0 RAM memory: NVIDIA Corporation MCP51 Host Bridge (rev a2) 00:0a.0 ISA bridge: NVIDIA Corporation MCP51 LPC Bridge (rev a3) 00:0a.1 SMBus: NVIDIA Corporation MCP51 SMBus (rev a3) 00:0a.2 RAM memory: NVIDIA Corporation MCP51 Memory Controller 0 (rev a3) 00:0b.0 USB controller: NVIDIA Corporation MCP51 USB Controller (rev a3) 00:0b.1 USB controller: NVIDIA Corporation MCP51 USB Controller (rev a3) 00:0d.0 IDE interface: NVIDIA Corporation MCP51 IDE (rev a1) 00:0e.0 RAID bus controller: NVIDIA Corporation MCP51 Serial ATA Controller (rev a1) 00:0f.0 RAID bus controller: NVIDIA Corporation MCP51 Serial ATA Controller (rev a1) 00:10.0 PCI bridge: NVIDIA Corporation MCP51 PCI Bridge (rev a2) 00:10.1 Audio device: NVIDIA Corporation MCP51 High Definition Audio (rev a2) 00:14.0 Bridge: NVIDIA Corporation MCP51 Ethernet Controller (rev a3) 01:00.0 PCI bridge: Intel Corporation 80333 Segment-A PCI Express-to-PCI Express Bridge 01:00.2 PCI bridge: Intel Corporation 80333 Segment-B PCI Express-to-PCI Express Bridge 02:0e.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1068 06:08.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA 1064SG [Mystique] (rev 02) 06:0a.0 Ethernet controller: Intel Corporation 82544EI Gigabit Ethernet Controller (Copper) (rev 02) [root@titan1 ~]# lspci -vv <snip> 02:0e.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1068 Subsystem: Intel Corporation RAID Controller SRCSAS18E Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 16 Region 0: Memory at cfef0000 (32-bit, prefetchable) [size=64K] Region 2: Memory at cfdc0000 (32-bit, non-prefetchable) [size=128K] [virtual] Expansion ROM at cfe00000 [disabled] [size=32K] Capabilities: [c0] Power Management version 2 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [d0] MSI: Enable- Count=1/2 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [e0] PCI-X non-bridge device Command: DPERE- ERO- RBC=512 OST=4 Status: Dev=02:0e.0 64bit+ 133MHz+ SCD- USC- DC=bridge DMMRBC=1024 DMOST=4 DMCRS=16 RSCEM- 266MHz- 533MHz- Kernel modules: megaraid_sas
[root@titan1 CmdTool2]# dmesg |less
<snip>
ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata4.00: ATA-8: WDC WD2500BEKT-60PVMT0, 01.01A01, max UDMA/133 ata4.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 31/32) ata4.00: configured for UDMA/133
ata6: SATA link down (SStatus 0 SControl 300) megasas: 06.803.01.00-rh1 Mon. Mar. 10 17:00:00 PDT 2014 megasas: 0x1000:0x0411:0x8086:0x1001: bus 2:slot 14:func 0 megaraid_sas 0000:02:0e.0: enabling device (0080 -> 0082) ACPI: PCI Interrupt Link [AXV7] enabled at IRQ 16 alloc irq_desc for 16 on node -1 alloc kstat_irqs on node -1 megaraid_sas 0000:02:0e.0: PCI INT A -> Link[AXV7] -> GSI 16 (level, low) -> IRQ 16 megasas: Waiting for FW to come to ready state sr0: scsi3-mmc drive: 40x/40x writer cd/rw xa/form2 cdda tray Uniform CD-ROM driver Revision: 3.20 sr 0:0:1:0: Attached scsi CD-ROM sr0 STARTING CRC_T10DIF sd 2:0:0:0: [sda] 488397168 512-byte logical blocks: (250 GB/232 GiB) sd 3:0:0:0: [sdb] 488397168 512-byte logical blocks: (250 GB/232 GiB) sd 2:0:0:0: [sda] Write Protect is off sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 3:0:0:0: [sdb] Write Protect is off sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sdb: sdb1 sdb2 sd 3:0:0:0: [sdb] Attached SCSI disk sda1 sda2 sd 2:0:0:0: [sda] Attached SCSI disk INFO: task modprobe:351 blocked for more than 120 seconds. Not tainted 2.6.32-504.3.3.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. modprobe D 0000000000000000 0 351 1 0x00000000 ffff880237281eb8 0000000000000082 0000000000000000 0000000000000001 ffff880237281e18 ffffffff8115c876 0000002a3dab586f ffffffff810bfcff ffff880237281e48 00000000fffe3007 ffff88023725baf8 ffff880237281fd8 Call Trace: [<ffffffff8115c876>] ? vfree+0x36/0x80 [<ffffffff810bfcff>] ? load_module+0x1abf/0x1cd0 [<ffffffffa003d000>] ? wait_scan_init+0x0/0xd [scsi_wait_scan] [<ffffffff8136cbe5>] wait_for_device_probe+0x55/0x90 [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 [<ffffffff810a5155>] ? __blocking_notifier_call_chain+0x65/0x80 [<ffffffffa003d009>] wait_scan_init+0x9/0xd [scsi_wait_scan] [<ffffffff8100204c>] do_one_initcall+0x3c/0x1d0 [<ffffffff810bfff1>] sys_init_module+0xe1/0x250 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b FW state [0] hasn't changed in 180 secs pcidata = 30400 megaraid_sas 0000:02:0e.0: megasas: FW restarted successfully from megasas_init_fw! megasas: Waiting for FW to come to ready state INFO: task modprobe:351 blocked for more than 120 seconds. Not tainted 2.6.32-504.3.3.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. modprobe D 0000000000000000 0 351 1 0x00000000 ffff880237281eb8 0000000000000082 0000000000000000 0000000000000001 ffff880237281e18 ffffffff8115c876 0000002a3dab586f ffffffff810bfcff ffff880237281e48 00000000fffe3007 ffff88023725baf8 ffff880237281fd8 Call Trace: [<ffffffff8115c876>] ? vfree+0x36/0x80 [<ffffffff810bfcff>] ? load_module+0x1abf/0x1cd0 [<ffffffffa003d000>] ? wait_scan_init+0x0/0xd [scsi_wait_scan] [<ffffffff8136cbe5>] wait_for_device_probe+0x55/0x90 [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 [<ffffffff810a5155>] ? __blocking_notifier_call_chain+0x65/0x80 [<ffffffffa003d009>] wait_scan_init+0x9/0xd [scsi_wait_scan] [<ffffffff8100204c>] do_one_initcall+0x3c/0x1d0 [<ffffffff810bfff1>] sys_init_module+0xe1/0x250 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task modprobe:351 blocked for more than 120 seconds. Not tainted 2.6.32-504.3.3.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. modprobe D 0000000000000000 0 351 1 0x00000000 ffff880237281eb8 0000000000000082 0000000000000000 0000000000000001 ffff880237281e18 ffffffff8115c876 0000002a3dab586f ffffffff810bfcff ffff880237281e48 00000000fffe3007 ffff88023725baf8 ffff880237281fd8 Call Trace: [<ffffffff8115c876>] ? vfree+0x36/0x80 [<ffffffff810bfcff>] ? load_module+0x1abf/0x1cd0 [<ffffffffa003d000>] ? wait_scan_init+0x0/0xd [scsi_wait_scan] [<ffffffff8136cbe5>] wait_for_device_probe+0x55/0x90 [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 <snip>
|
I don't think any of the above messages from kernel are any more then long time out values for drives. That is odd but ?? I can post full kernel message output if needed.....
Installed Intel RAID services / tools:
[root@titan1 RAID]# ls
ir3_1068SASHWR_FW_v1.12.280-0826_pkg-v7.0.1-0075.zip ir3_Linux_x86_RWC2_v14.05.02.03.tar.gz Linux_x64_RWC2_v14.08.01-04.tar.gz
ir3_CmdTool2_Linux_v8.07.16.zip ir3_UEFI_CmdTool2_v2.03.00.s6.zip MR_Linux_drv_v6.705.07.00.tgz
[root@titan1 RAID]# cd /usr/local/RAID\ Web\ Console\ 2/
[root@titan1 RAID Web Console 2]# ./start
starthelp.sh startmonitorhelp.sh startupui.sh
[root@titan1 RAID Web Console 2]# ./startupui.sh
Messave above is just lack of connection from web UI to ?? agent? What I am not clear on is what agent or service can I check for to be running such that this UI could connect to?
Attempt to flash firmware to controller: |
---|
[root@titan1 tmp]# cp /root/RAID/ir3_1068SASHWR_FW_v1.12.280-0826_pkg-v7.0.1-0075.zip . [root@titan1 tmp]# unzip ir3_1068SASHWR_FW_v1.12.280-0826_pkg-v7.0.1-0075.zip Archive: ir3_1068SASHWR_FW_v1.12.280-0826_pkg-v7.0.1-0075.zip inflating: update.nsh creating: CmdTool2/DOS/ inflating: CmdTool2/DOS/CmdTool2.exe inflating: CmdTool2/DOS/CMDTool2_DOS_v8.00.11_rel-notes.txt inflating: CmdTool2/DOS/LICENSE_DOS32A.txt creating: CmdTool2/Linux/ inflating: CmdTool2/Linux/CmdTool2-8.00.13-1.i386.rpm inflating: CmdTool2/Linux/CMDTool2_Linux_v8.00.13_rel-notes.txt inflating: CmdTool2/Linux/Lib_Utils-1.00-07.noarch.rpm inflating: CmdTool2/Linux/Lib_Utils2-1.00-01.noarch.rpm creating: CmdTool2/Solaris/ inflating: CmdTool2/Solaris/CmdTool2 inflating: CmdTool2/Solaris/CmdTool2.pkg inflating: CmdTool2/Solaris/CMDTool2_Solaris_v8.00.06_rel-notes.txt creating: CmdTool2/UEFI/ inflating: CmdTool2/UEFI/CmdTool2.efi inflating: CmdTool2/UEFI/CMDTool2_UEFI_v2.01.00.S6_rel-notes.txt creating: CmdTool2/Windows/ inflating: CmdTool2/Windows/CmdTool2.exe inflating: CmdTool2/Windows/CmdTool2Support.zip inflating: CmdTool2/Windows/CMDTool2_Windows_v8.00.11_rel-notes.txt inflating: 68_fw826.rom inflating: ir3_1068SASHWR_Firmware_v1.12.280-0826_readme.txt inflating: License_v2.pdf inflating: update.bat inflating: 68_fw826_4MB.rom [root@titan1 tmp]# [root@titan1 tmp]# mv *.rom /opt/MegaRAID/CmdTool2/ [root@titan1 CmdTool2]# ./CmdTool264 -adpfwflash -f 68_fw826.rom Invalid input at or near token 68_fw826.rom
Exit Code: 0x01 [root@titan1 CmdTool2]#
|
Questions:
1) Has anyone any experiance and futher direction on how to debug this further?
2) Does anyone have this RAID controller running CentOS / RHEL 6.6?
3) The inabiltiy to do <ctrl +G> and UEFI is not good. What I do have is a note in my system change control about something like this which noted to get by it by "disabling all other controller BIOS on motherboard" . I would like to validate if others have this issue, and or work around for this.
Thanks,