[LON-CAPA-dev] RE: LON-CAPA-dev lonc connections dying

Fox, Ron lon-capa-dev@mail.lon-capa.org
Mon, 7 Jun 2010 06:55:12 -0400


DEAD - means that the lonc/lond had several consecutive problems and the lonc is giving up talking to the remote system.  It can happen in several ways:
- Connection failures
- Communication failures.

Just to summarize something I put up on the bug Gerd created:
- This apparently is happening because of timeouts between 
  the lonc/lond.
- I suspect that the large data transfers are taking longer 
  than the timeout to get started due to whatever processing
  is needed on the remote end to marshall the data.
- If you want to play with lonc timeouts you can do this in 
  a couple of ways:

Increasing the timeout on a transaction:

Loncom/LondConnection.pm - Locate sub new

Locate the text:

                     TimeoutValue       => 30,

Change 30 to something bigger, that's the number of seconds of 'dead air' that's acceptable in the middle of a lonc/lond transaction before a time out is declared.

Removing transaction timeouts altogether:
If the marshalling time for data on the remote end can be unbounded then the very idea of a timeout is bad as you can't create a timeout that is a valid upper bound on the time between request and data flowing back at you.

To remove timeouts in the middle of transactions altogether, edit loncnew:

Locate sub MakeLondConnection

Locate the line that reads:

	$Connection->SetTimeoutCallback(\&SocketTimeout);

Comment it out.  Now the transaction can time out, but loncnew won't ever become aware of it so it won't shut down the socket to lond...but instead keep waiting for the transaction to finish. 

I would be curious to know which of these approaches best improves stability.


Ron Fox
Part-time LON-CAPA Body.