Tuesday, December 8, 2009

WTRN0078E: An attempt by the transaction manager to call start on a transactional resource has resulted in error

Watch out when you r working with WPS and MQ ( If MQ is the messaging provider ) moreover its MQ Z/0S and your MQ adminstrator is bugging you to have MQDISCT ( MQ disconnect ) on the server connection channel.

MQDISCT - allows a channel instance to timeout if there is not activity on the channel for a specific time interval. This setting seems to be good from MQ admin point of view , as he/she would not want the channel instances to grow in production ( specially if its a shared queue manager for several projects )

But be aware this configuration can really affect your WAS/WPS environment - as this would cause your listenerPort to be restarted every time the MDB session tries to pick message off a queue - since the server part of the handle would have already been disconnected by MQ ( timeout ) your listenerPort will be restarted whenever you have message on the queue after the MQDISCT.

-------------------------------------------------------------------------------------
We had a a J2EE application comprising of Stateless session beans and MDB's, We have deployed these on Websphere Process Server 6.0.2.5

The messaging for WPS is on Websphere MQ on Z/OS.We sometimes face some XA errors with respect to MQ.

-------------------------------------------------------------------------------------

There is a strong reasoning on this XA errors with respect to MQ - it would cause your listenerPort to get restarted with these errors :-


[7/14/09 9:42:39:186 PDT] 0000005a XATransaction E J2CA0027E: An exception occ
JMSManagedConnection@194476453, within transaction ID {XidImpl: formatId(5741534
9ec0588062b4c152a8f24de408b033938e658132000001227a24f8ae0000000100000d5a9ec05880
.transaction.xa.XAException: XA operation failed, see errorCode
at com.ibm.mq.MQXAResource.start(MQXAResource.java:447

[7/14/09 9:42:39:222 PDT] 0000005a RegisteredRes E WTRN0078E: An attempt by th
e transaction manager to call start on a transactional resource has resulted in
an error. The error code was XAER_RMERR. The exception stack trace follows: java
x.transaction.xa.XAException: XA operation failed, see errorCode
at com.ibm.mq.MQXAResource.start(MQXAResource.java:447)
at com.ibm.ejs.jms.JMSManagedSession$JMSXAResource.start(JMSManagedSessi
on.java:1318)
at com.ibm.ejs.j2c.XATransactionWrapper.start(XATransactionWrapper.java(

at com.ibm.ws.Transaction.JTA.JTAResourceBase.start(JTAResourceBase.java(Compiled Code))
... 21 more
while trying to enlist resources from datasource JMS$QCF$JMSManagedConnection@194476453 with the Transaction Manager for the current transaction, and threw
a ResourceException.
[7/14/09 9:42:39:281 PDT] 0000005a ConnectionEve A J2CA0056I: The Connection Manager received a fatal connection error from the Resource Adaptor for resour
ce JMS$QCF$JMSManagedConnection@194476453. The exception which was received is javax.jms.JMSException: Enlist failed
[7/14/09 9:42:39:284 PDT] 0000005a ConnectionEve A J2CA0056I: The Connection Manager received a fatal connection error from the Resource Adaptor for resour
ce jms/QCF. The exception which was received is javax.jms.JMSException: Enlist failed
[7/14/09 9:42:39:316 PDT] 0000005a JMSExceptionL E WMSG0018E: Error on JMSConnection for MDB ImportMDB , JMSDestination jms/ImportingQ : javax.jms.JM
SException: Enlist failed
at com.ibm.ejs.jms.JMSCMUtils.mapToJMSException(JMSCMUtils.java:135)
at com.ibm.ejs.jms.JMSManagedSession.enlist(JMSManagedSession.java:931)
at com.ibm.ejs.jms.JMSSessionHandle.enlist(JMSSessionHandle.java:1002)
at com.ibm.ejs.jms.listener.ServerSession.onMessage(ServerSession.java:611)
at com.ibm.ejs.jms.listener.ServerSession.dispatch(ServerSession.java:594)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)



With WAS v6.0.2 using a ListenerPort to drive an MDB from a WebSphere MQ
queue, the application server makes a connection to the Queue Manager.
The default behaviour is that when a message is found on the WMQ queue,
a new connection is made to the Queue Manager to retrieve and
subsequently process and pass the message to the MDB. When the MDB
processing has completed, this new MQ Connection is put into a WAS
controlled pool, ready to be called upon to get the next message off the
WMQ queue there will be no activity on this MQ connection unless a
message requires processing. The Listener Port MQ connection is
separate, and is always active, and does not get disconnected. The
important point to note here is that this connection will continue to
exist for the lifetime of the ListenerPort, it will not timeout or be
discarded unless:
(a) The ListenerPort is restarted
(b) The connection to the Queue Manager is found to be disconnected.

When a DISCINT value is used on the Queue Manager, the socket is
closed on the Queue Manager after the DISCINT timeout value is reached
if no activity has been observed on the socket. However, when this
happens, there is nothing to let the WAS pooled WMQ connection know that
the socket has been closed. As a consequence, the next time the
connection is attempted to be used it received a 2009 return code
(MQRC_CONNECTION_BROKEN) and the ListenerPort stops.

A simplified timeline of events to illustrate the
problem is as follows:

(1) ListenerPort thread: Message detected on WMQ Queue
(2) ListenerPort thread: Request pooled MQ Connection retrieve message
and process in MDB
(3) MQConnection_1: Start XA transaction, get message, process in MDB.
End XA transaction.
(4) ... 30 minute delay (say) before another message is required to be
processed by MQConnection_1
(5) WMQ Queue Manager closes socket as DISCINT time has been reached for
connection
(6) ListenerPort thread: Message detected on WMQ Queue
(7) ListenerPort thread: Request pooled MQConnection_1 retrieve message
and process in MDB
(8) MQConnection_1: Request XA transaction start boundary.
(9) MQConnection_1: Connection to Queue Manager broken detected (RC
2009), throwing XA error messages seen in logs.
(10) Stop ListenerPort.
(11) 60 second wait
(12) Start ListenerPort
(13) ListenerPort thread: Message detected on WMQ Queue
(14) ListenerPort thread: Request pooled MQ Connection retrieve message
and process in MDB
(15) Pool is empty, so new MQConnection made to Queue Manager
(16) MQConnection_2: Start XA transaction, get message, process in MDB.
End XA transaction.

So the overall effect of using the DISCINT value on the Queue Manager
in this scenario is to cause the ListenerPort to restart (with a 60
second delay, which is a configurable value, see
http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp?topic=/c
om.ibm.websphere.base.doc/info/aes/ae/umb_prolscp.html), with an
associated number of error messages being printed out the logs.

Note also that the amount of time specified by the DISCONNECT value
could be larger than the amount of time between messages appearing on
the WMQ Queue, and the issue could still be seen. Suppose two
WMQConnections had been created and stored in the WAS pool (from
previously processing two messages on the WMQ queue simultaneously). If
there was a 15 minute wait before another message appeared, which was
processed by MQConnection_1, then another 15 minute wait before the next
message appeared, MQConnection_2 would be disconnected by the Queue
Manager even though a message has been put on the queue every 15
minutes.

The simplest solution is to discontinue the use of the DISCINT value on the Queue
Manager. Setting it to 0 prevents the Queue Manager from closing the
sockets, and thus preventing the problem. Is there some reason why the
customer is using this variable?

Further information about DISCINT can be found in the following article
in the Information Center:

http://publib.boulder.ibm.com/infocenter/wmqv6/v6r0/topic/com.ibm.mq.csq
zae.doc/ic11690_.htm?resultof=%22%44%49%53%43%49%4e%54%22%20%22%64%69%73
%63%69%6e%74%22%20

The WMQ heartbeat mechanism does not come into play in this scenario, as
the MQGET calls are made with short timeout intervals, so the HBINT
interval is not reached.

----------------------------------------------------------------------

No comments: