本帖最后由 ccton 于 2014-2-18 12:08 编辑

[root@**** hydata]# cat /etc/redhat-release

Red Hat Enterprise Linux Server release 5.6 (Tikanga)

[root@**** hydata]# uname -a

Linux gywsj.hyb210 2.6.18-238.el5 #1 SMP Sun Dec 19 14:22:44 EST 2010 x86_64 x86_64 x86_64 GNU/Linux

数据库版本:

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production

问题描述,运行一段时间后出现挂载阵列的文件系统会逻辑错误,自动变为只读

另外,曾经用循环批量写入大文件,将磁盘写满也未报过错误,重新mount后写文件也正常

我怀疑是阵列的电压不稳定导致磁盘逻辑块错误,或者是ORACLE bug,但未找到相关资料证明

请各位高手帮忙诊断下

下面是相关日志

数据库日志:

Tue Feb 18 09:53:28 2014

Archived Log entry 18018 added for thread 1 sequence 565 ID 0x51475291 dest 1:

Tue Feb 18 10:19:41 2014

Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ckpt_30996.trc:

ORA-00206: 写入控制文件时出错 (块 3, # 块 1)

ORA-00202: 控制文件: ''/hydata/flash_recovery_area/orcl/control02.ctl''

ORA-27072: 文件 I/O 错误

Linux-x86_64 Error: 30: Read-only file system

Additional information: 4

Additional information: 3

Additional information: -1

Tue Feb 18 10:19:41 2014

KCF: read, write or open error, block=0xaa13a online=1

Tue Feb 18 10:19:41 2014

KCF: read, write or open error, block=0xa5dfd online=1

file=5 '/hydata/tablespaces/cmsservergy.dat'

Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ckpt_30996.trc:

ORA-00221: 写入控制文件时出错

ORA-00206: 写入控制文件时出错 (块 3, # 块 1)

ORA-00202: 控制文件: ''/hydata/flash_recovery_area/orcl/control02.ctl''

ORA-27072: 文件 I/O 错误

Linux-x86_64 Error: 30: Read-only file system

Additional information: 4

Additional information: 3

Additional information: -1

file=5 '/hydata/tablespaces/cmsservergy.dat'

Tue Feb 18 10:19:41 2014

KCF: read, write or open error, block=0x21593e online=1

error=27072 txt: 'Linux-x86_64 Error: 30: Read-only file system

CKPT (ospid: 30996): terminating the instance due to error 221

file=10 '/hydata/tablespaces/cmsservergy4.dat'

error=27072 txt: 'Linux-x86_64 Error: 30: Read-only file system

Additional information: 4

error=27072 txt: 'Linux-x86_64 Error: 30: Read-only file system

Additional information: 4

Tue Feb 18 10:19:41 2014

KCF: read, write or open error, block=0x153877 online=1

Additional information: 4

Additional information: 696634

file=10 '/hydata/tablespaces/cmsservergy4.dat'

Additional information: 2185534

Additional information: 679421

Additional information: -1'

error=27072 txt: 'Linux-x86_64 Error: 30: Read-only file system

Additional information: -1'

Additional information: -1'

Additional information: 4

Additional information: 1390711

Additional information: -1'

Tue Feb 18 10:19:41 2014

Some DDE async actions failed or were cancelled

Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_lgwr_30992.trc:

ORA-00345: 重做日志写入块 193051 计数 13 出错

ORA-00312: 联机日志 2 线程 1: '/hydata/orcl/redo02.log'

ORA-27072: 文件 I/O 错误

Linux-x86_64 Error: 5: Input/output error

Additional information: 4

Additional information: 193051

Additional information: -1

Tue Feb 18 10:19:42 2014

opiodr aborting process unknown ospid (11121) as a result of ORA-1092

Tue Feb 18 10:19:42 2014

ORA-1092 : opitsk aborting process

Instance terminated by CKPT, pid = 30996

操作系统日志:

Feb 18 10:19:05 gywsj kernel: INFO: task extract:32604 blocked for more than 120 seconds.

Feb 18 10:19:05 gywsj kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Feb 18 10:19:05 gywsj kernel: extract       D ffffffff80153806     0 32604   9994               32577 (NOTLB)

Feb 18 10:19:05 gywsj kernel:  ffff8101d9957b78 0000000000000082 ffff810001059800 0000000000000000

Feb 18 10:19:05 gywsj kernel:  ffffffff804d3480 000000000000000a ffff8102c9a5e080 ffff81087fffb080

Feb 18 10:19:05 gywsj kernel:  00067b698e9d15a4 000000000000322f ffff8102c9a5e268 0000001a00000000

Feb 18 10:19:05 gywsj kernel: Call Trace:

Feb 18 10:19:05 gywsj kernel:  [] do_gettimeofday+0x40/0x90

Feb 18 10:19:05 gywsj kernel:  [] sync_page+0x0/0x43

Feb 18 10:19:05 gywsj kernel:  [] io_schedule+0x3f/0x67

Feb 18 10:19:05 gywsj kernel:  [] sync_page+0x3e/0x43

Feb 18 10:19:05 gywsj kernel:  [] __wait_on_bit+0x40/0x6e

Feb 18 10:19:05 gywsj kernel:  [] wait_on_page_bit+0x6c/0x72

Feb 18 10:19:05 gywsj kernel:  [] wake_bit_function+0x0/0x23

Feb 18 10:19:05 gywsj kernel:  [] pagevec_lookup_tag+0x1a/0x21

Feb 18 10:19:05 gywsj kernel:  [] wait_on_page_writeback_range+0x62/0x12e

Feb 18 10:19:05 gywsj kernel:  [] do_writepages+0x29/0x2f

Feb 18 10:19:05 gywsj kernel:  [] __filemap_fdatawrite_range+0x50/0x5b

Feb 18 10:19:05 gywsj kernel:  [] filemap_write_and_wait+0x26/0x31

Feb 18 10:19:05 gywsj kernel:  [] generic_file_direct_IO+0x81/0x122

Feb 18 10:19:05 gywsj kernel:  [] __generic_file_aio_read+0xb8/0x198

Feb 18 10:19:05 gywsj kernel:  [] generic_file_aio_read+0x34/0x39

Feb 18 10:19:05 gywsj kernel:  [] do_sync_read+0xc7/0x104

Feb 18 10:19:05 gywsj kernel:  [] autoremove_wake_function+0x0/0x2e

Feb 18 10:19:05 gywsj kernel:  [] hrtimer_cancel+0xc/0x16

Feb 18 10:19:05 gywsj kernel:  [] hrtimer_nanosleep+0x58/0x118

Feb 18 10:19:05 gywsj kernel:  [] vfs_read+0xcb/0x171

Feb 18 10:19:05 gywsj kernel:  [] sys_read+0x45/0x6e

Feb 18 10:19:05 gywsj kernel:  [] tracesys+0xd5/0xe0

Feb 18 10:19:05 gywsj kernel:

Feb 18 10:19:41 gywsj kernel: sd 3:0:0:1: timing out command, waited 360s

Feb 18 10:19:41 gywsj kernel: sd 3:0:0:1: SCSI error: return code = 0x060d0000

Feb 18 10:19:41 gywsj kernel: end_request: I/O error, dev sdc, sector 662064886

Feb 18 10:19:41 gywsj kernel: Buffer I/O error on device sdc5, logical block 82758095

Feb 18 10:19:41 gywsj kernel: lost page write due to I/O error on sdc5

Feb 18 10:19:41 gywsj kernel: Buffer I/O error on device sdc5, logical block 82758096

Feb 18 10:19:41 gywsj kernel: lost page write due to I/O error on sdc5

Feb 18 10:19:41 gywsj kernel: Aborting journal on device sdc5.

Feb 18 10:19:41 gywsj kernel: ext3_abort called.

Feb 18 10:19:41 gywsj kernel: EXT3-fs error (device sdc5): ext3_journal_start_sb: Detected aborted journal

Feb 18 10:19:41 gywsj kernel: Remounting filesystem read-only

--重新mount后可以写入文件

操作系统日志:

Feb 18 10:47:35 gywsj kernel: __journal_remove_journal_head: freeing b_frozen_data

Feb 18 10:47:35 gywsj last message repeated 2 times

Feb 18 10:47:35 gywsj kernel: ext3_abort called.

Feb 18 10:47:35 gywsj kernel: EXT3-fs error (device sdc5): ext3_put_super: Couldn't clean up the journal

Feb 18 10:47:51 gywsj kernel: kjournald starting.  Commit interval 5 seconds

Feb 18 10:47:51 gywsj kernel: EXT3-fs warning (device sdc5): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure

Feb 18 10:47:51 gywsj kernel: EXT3-fs warning (device sdc5): ext3_clear_journal_err: Marking fs in need of filesystem check.

Feb 18 10:47:51 gywsj kernel: EXT3-fs warning: mounting fs with errors, running e2fsck is recommended

Feb 18 10:47:51 gywsj kernel: EXT3 FS on sdc5, internal journal

Feb 18 10:47:51 gywsj kernel: EXT3-fs: recovery complete.

Feb 18 10:47:51 gywsj kernel: EXT3-fs: mounted filesystem with ordered data mode.

Logo

一站式 AI 云服务平台

更多推荐