http://czmmiao.iteye.com/blog/1474678昨天客户反映说在下午某时间段有几个事务失败了,让我查下当时数据库系统的负载是否正常,看了下CPU的历史负载,很正常,于是看了下日志发现有几个timeout错误和一个ORA-02050错误。 Tue Feb 15 20:44:05 2011 Error 1013 trapped in 2PC on transaction 199.40.297416. Cleaning up. Tue Feb 15 20:44:56 2011 Error stack returned to user: ORA-02050: transaction 199.40.297416 rolled back, some remote DBs may be in-doubt ORA-01013: user requested cancel of current operation 这个是分布式事务处理出错的提示,根据提示可知,可能是由于远程的数据库处理失败,导致事务号为199.40.297416的事务失败而回滚。原因可能是网络,也可能是远程节点的连接超时或是人为kill等。 Oracle给出的解释是: Error: ORA 2050 Text: transaction <num> rolled back, some remote DBs may be in-doubt ------------------------------------------------------------------------------- Cause:There was a network failure or a remote failure during the two-phase commit. Action:Notify the database administrator. The remote databases will automatically resynchronize when the failure is repaired. 大致意思为:一种可能为网络失败,另外一种是事务在两个阶段commit时,远程节点操作不成功,这时,它会记在dba_2pc_pending中,reco进程后续会自动尝试去继续完成这个事务。但如果能确定这个节点段时间内不能完成,可能就需要人工处理这个pending transaction了。 查询数据字典dba_2pc_pending和dba_2pc_neighbors均无任何记录,说明事务已经自动被恢复了。 附录: dba_2pc_pending describes distributed transactions awaiting recovery. dba_2pc_neighbors describes incoming and outgoing connections for pending transactions. 在我这个例子中,由于事务已经被自动恢复,所以无需人工干预,如果没有自动恢复的话,可以通过如下语句处理: 强制提交或回滚: SQL>commit force '199.40.297416'; 或 SQL>rollback force '199.40.297416'; 如果事务的状态为collecting,则不需要commit/rollback force,可这样处理,如下: 1、disable分布式恢复 SQL>alter system disable distributed recovery; 2、purge in-doubt transaction entry SQL>exec dbms_transaction.purge_lost_db_entry('199.40.297416'); 3、最后enable分布式恢复 SQL>alter system enable distributed recovery;