读书人

database and os was hang with ORA-2

发布时间: 2013-01-04 10:04:14 作者: rapoo

database and os was hang with ORA-240 ERROR

Environment: AIX 6.1 ,TWO node rac,oracle 10.2.0.4


Symptom :

用户反映无法打开MES程序,连接到2号节点查看,发现sqlplus /as sysdba

回车后卡死,cancel不掉?重新开一个连接,到root账户下 crsctl check crs 命令也卡死,lsnrctl status 命令卡死,OS命令正常。

1号节点正常。

Alterlog有如下报错


Sat Nov 3 16:37:32 2012

Thread 2 advanced to log sequence 55778 (LGWR switch)

Current log# 8 seq# 55778 mem# 0: +SFC3DB_DATA/sfc3db/onlinelog/group_8.304.722515055

Current log# 8 seq# 55778 mem# 1: +SFC3DB_ARCH/sfc3db/onlinelog/group_8.264.722515055

Sat Nov 3 16:40:19 2012

Errors in file /apps/oracle/admin/sfc3db/bdump/sfc3db2_arc0_1843352.trc:

ORA-00240: control file enqueue held for more than 120 seconds

Sat Nov 3 16:42:29 2012

Errors in file /apps/oracle/admin/sfc3db/bdump/sfc3db2_arc1_565424.trc:

ORA-00240: control file enqueue held for more than 120 seconds

Sat Nov 3 16:47:00 2012

kkjcre1p: unable to spawn jobq slave process

Sat Nov 3 16:47:00 2012

Sat Nov 3 17:04:22 2012

ksvsubmit: Process(O001) creation failed



系统应该是无法分配新的进程了,所以这些命令都卡死, 但是当时OS free的内存还有10多G,也远没有达到最大进程数。

我最后的解决办法是直接reboot OS,重启后OK, 各位大神有遇到过这种情况的吗?需要什么LOG 我发上来。

如下是trc文件

sfc3rac2:/apps/oracle/admin/sfc3db/bdump$ cat /apps/oracle/admin/sfc3db/bdump/sfc3db2_cjq0_5861420.trc

Dump file /apps/oracle/admin/sfc3db/bdump/sfc3db2_cjq0_5861420.trc

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production

With the Partitioning, Real Application Clusters, OLAP, Data Mining

and Real Application Testing options

ORACLE_HOME = /apps/oracle/product/10.2.0

System name: AIX

Node name: sfc3rac2

Release: 1

Version: 6

Machine: 00C8B9724C00

Instance name: sfc3db2

Redo thread mounted by this instance: 2

Oracle process number: 15

Unix process pid: 5861420, image: oracle@sfc3rac2 (CJQ0)


*** 2012-11-03 06:00:41.427

*** SERVICE NAMESYS$BACKGROUND) 2012-11-03 06:00:41.419

*** SESSION ID2192.1) 2012-11-03 06:00:41.419

GATHER_STATS_JOB: Stopped by Scheduler.

Consider increasing the maintenance window duration if this happens frequently.

The following objects/segments were not analyzed due to timeout:

*** 2012-11-03 16:31:39.479

Waited for process J000 to initialize for 60 seconds

*** 2012-11-03 16:31:39.479

Dumping diagnostic information for J000:



OS pid = 917522

loadavg : 3.77 3.49 3.60

swap info: free_mem = 9427.14M rsv = 64.00M

alloc = 241.15M avail = 16384.00M swap_free = 16142.85M

F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD

1040005 A oracle 917522 1 0 60 20 1ad230590 89196 16:30:38 - 0:00 [oracle]

skgpgpstack: read() for cmd /bin/sh -c '/usr/bin/procstack 917522 2>&1' timed out after 60 seconds

*** 2012-11-03 16:33:40.701

*** 2012-11-03 16:33:50.907

Waited for process J000 to initialize for 70 seconds

*** 2012-11-03 16:33:50.907

Dumping diagnostic information for J000:

OS pid = 917522

loadavg : 2.86 3.25 3.49

swap info: free_mem = 9348.35M rsv = 64.00M

alloc = 241.11M avail = 16384.00M swap_free = 16142.89M

F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD

1040005 A oracle 917522 1 0 60 20 1ad230590 89196 16:30:38 - 0:00 [oracle]

skgpgpstack: read() for cmd /bin/sh -c '/usr/bin/procstack 917522 2>&1' timed out after 60 seconds

*** 2012-11-03 16:35:52.258

*** 2012-11-03 16:36:02.459

Waited for process J000 to initialize for 80 seconds

*** 2012-11-03 16:36:02.459

Dumping diagnostic information for J000:

OS pid = 917522

loadavg : 2.74 3.00 3.36

swap info: free_mem = 9385.39M rsv = 64.00M

alloc = 241.10M avail = 16384.00M swap_free = 16142.90M

F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD

1040005 A oracle 917522 1 0 60 20 1ad230590 89196 16:30:38 - 0:00 [oracle]



skgpgpstack: read() for cmd /bin/sh -c '/usr/bin/procstack 917522 2>&1' timed out after 60 seconds

*** 2012-11-03 16:38:03.960

*** 2012-11-03 16:38:14.165

Waited for process J000 to initialize for 90 seconds

*** 2012-11-03 16:38:14.165

Dumping diagnostic information for J000:

OS pid = 917522

loadavg : 2.56 2.96 3.30

swap info: free_mem = 9440.27M rsv = 64.00M

alloc = 241.00M avail = 16384.00M swap_free = 16143.00M

F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD

1040005 A oracle 917522 1 0 60 20 1ad230590 89196 16:30:38 - 0:00 [oracle]

skgpgpstack: read() for cmd /bin/sh -c '/usr/bin/procstack 917522 2>&1' timed out after 60 seconds

*** 2012-11-03 16:40:16.175

*** 2012-11-03 16:40:26.384

Waited for process J000 to initialize for 100 seconds

*** 2012-11-03 16:40:26.384

Dumping diagnostic information for J000:

OS pid = 917522

loadavg : 2.82 2.90 3.23

swap info: free_mem = 9488.88M rsv = 64.00M

alloc = 240.96M avail = 16384.00M swap_free = 16143.04M

F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD

1040005 A oracle 917522 1 0 60 20 1ad230590 89196 16:30:38 - 0:00 [oracle]

skgpgpstack: read() for cmd /bin/sh -c '/usr/bin/procstack 917522 2>&1' timed out after 60 seconds

*** 2012-11-03 16:42:29.100

*** 2012-11-03 16:42:39.305

Waited for process J000 to initialize for 110 seconds

*** 2012-11-03 16:42:39.305

Dumping diagnostic information for J000:

OS pid = 917522

loadavg : 4.05 3.26 3.32

swap info: free_mem = 9640.50M rsv = 64.00M

alloc = 240.90M avail = 16384.00M swap_free = 16143.10M

F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD



1040005 A oracle 917522 1 0 60 20 1ad230590 89196 16:30:38 - 0:00 [oracle]

skgpgpstack: read() for cmd /bin/sh -c '/usr/bin/procstack 917522 2>&1' timed out after 60 seconds

*** 2012-11-03 16:44:43.732

*** 2012-11-03 16:44:53.936

Waited for process J000 to initialize for 120 seconds

*** 2012-11-03 16:44:53.936

Dumping diagnostic information for J000:

OS pid = 917522

loadavg : 3.03 3.16 3.28

swap info: free_mem = 9667.09M rsv = 64.00M

alloc = 240.84M avail = 16384.00M swap_free = 16143.16M

F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD

1040005 A oracle 917522 1 0 60 20 1ad230590 89196 16:30:38 - 0:00 [oracle]

skgpgpstack: read() for cmd /bin/sh -c '/usr/bin/procstack 917522 2>&1' timed out after 60 seconds

*** 2012-11-03 16:47:00.265

Killing process (ospid 917522): (reason=x4 error=0)

... and the process is still alive after kill!

*** 2012-11-03 17:05:23.892

Waited for process J000 to initialize for 60 seconds

*** 2012-11-03 17:05:23.892

Dumping diagnostic information for J000:

OS pid = 3915970

loadavg : 2.43 2.42 2.75

swap info: free_mem = 10365.04M rsv = 64.00M

alloc = 239.53M avail = 16384.00M swap_free = 16144.47M

F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD

1040005 A oracle 3915970 1 0 60 20 105d26590 89196 17:04:22 - 0:00 [oracle]

skgpgpstack: read() for cmd /bin/sh -c '/usr/bin/procstack 3915970 2>&1' timed out after 60 seconds

*** 2012-11-03 17:07:35.954

*** 2012-11-03 17:07:46.156

Waited for process J000 to initialize for 70 seconds

*** 2012-11-03 17:07:46.156

Dumping diagnostic information for J000:



OS pid = 3915970

loadavg : 2.24 2.33 2.67

swap info: free_mem = 10400.53M rsv = 64.00M

alloc = 239.37M avail = 16384.00M swap_free = 16144.63M

F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD

1040005 A oracle 3915970 1 0 60 20 105d26590 89196 17:04:22 - 0:00 [oracle]

skgpgpstack: read() for cmd /bin/sh -c '/usr/bin/procstack 3915970 2>&1' timed out after 60 seconds

*** 2012-11-03 17:09:58.294

*** 2012-11-03 17:10:08.495

Waited for process J000 to initialize for 80 seconds

*** 2012-11-03 17:10:08.495

Dumping diagnostic information for J000:

OS pid = 3915970

loadavg : 2.36 2.38 2.64

swap info: free_mem = 10389.09M rsv = 64.00M

alloc = 239.23M avail = 16384.00M swap_free = 16144.77M

F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD

1040005 A oracle 3915970 1 0 60 20 105d26590 89196 17:04:22 - 0:00 [oracle]

skgpgpstack: read() for cmd /bin/sh -c '/usr/bin/procstack 3915970 2>&1' timed out after 60 seconds

*** 2012-11-03 17:12:20.685

*** 2012-11-03 17:12:30.886

Waited for process J000 to initialize for 90 seconds

*** 2012-11-03 17:12:30.886

Dumping diagnostic information for J000:



还有个trc文件太大暂时不放了 。


[解决办法]
看他给的原因好像是锁问题。
thank you
标志了以后用
[解决办法]
http://www.xifenfei.com/1838.html

LZ看看这个,操作系统bug导致ASM出现问题而引起的。

读书人网 >oracle

热点推荐