[LON-CAPA-admin] server goes unresponsive?
todd.ruskell at gmail.com
Tue Oct 23 13:42:42 EDT 2012
On Tue, Oct 23, 2012 at 10:59 AM, Stuart Raeburn <raeburn at msu.edu> wrote:
Thanks for the information about those variables and deciphering the log
> I strikes me as very much not good that my library server has been marked
>> "DEAD". Any ideas on what can cause a host to be so marked?
> According to the documentation in loncnew:
> If a socket timeout is detected the connection retries left is
> decremented. Once the number of retries left is zero, the host is marked as
> DEAD and no further attempts will be made by that child.
> Is the situation and the logging you describe from your access server
> (i.e., the access server is unable to connect to your library server), or
> is this the library server trying to talk to itself via lonc/lond, and
> failing? I assume the former, but just checking.
That's the unfortunate thing. Everything reported here is on the library
server. This is the library server trying to talk to itself.
Does that mean this effect is caused by some overloading of the server?
This is possible, but as best as I can tell, it's happened at different mid
to high server load levels, some of which we've experienced before without
As you probably suspect, when this happens the usual login page is replaced
by the "LON-CAPA is temporarily unavailable" page. And a simple restart
of loncontrol does get things up and running again. I suppose I could set
up a cron job to restart loncontrol every 5 minutes. But I'm also open to
> Stuart Raeburn
> LON-CAPA Academic Consortium
> Quoting Todd Ruskell <todd.ruskell at gmail.com>:
>> We've been experiencing a situation in which our library server seems to
>> suddenly go unresponsive, without much warning. In looking at logs, I see
>> a couple things.
>> First, in lonnet.log I see some entries like the following distributed
>> throughout the log file. I don't know how to interpret these entries, but
>> they seem to indicate something isn't quite right:
>> Sun Oct 7 12:00:23 2012 (21138): Starting Shut down
>> Sun Oct 7 12:00:23 2012 (21138): %badServerCache is 7
>> Sun Oct 7 12:00:23 2012 (21138): %homecache is 13510
>> Sun Oct 7 12:00:23 2012 (21138): %remembered is 7
>> Sun Oct 7 12:00:23 2012 (21138): kicks is 0
>> Sun Oct 7 12:00:23 2012 (21138): hits is 451259
>> Sun Oct 7 12:00:23 2012 (21138): Flushing log buffers
>> Sun Oct 7 12:00:23 2012 (21138): Shutting down
>> When the system seems to go unresponsive, lonnet.log has the following
>> Sun Oct 7 12:07:17 2012 (21568): <font color="blue">WARNING: Trying to
>> resource data for smarkoe at csm: con_lost</font>
>> Sun Oct 7 12:07:38 2012 (21871): <font color="blue">WARNING: Trying to
>> resource data for gajohnso at csm: con_lost</font>
>> ...above entry repeated several times and then several messages like ...
>> Sun Oct 7 12:07:40 2012 (21871): Could not devalidate spreadsheet esease
>> at csm
>> r/readingQuestions.problem: no_such_host con_lost
>> Sun Oct 7 12:07:41 2012 (21498): Could not devalidate spreadsheet jsingh
>> at csm
>> error: 100 tie(GDBM) Failed while attempting del con_lost
>> I strikes me as very much not good that my library server has been marked
>> "DEAD". Any ideas on what can cause a host to be so marked? It doesn't
>> appear to be load based, as it seems to have happened at a variety of load
>> levels, including pretty low. Any help you can provide would be greatly
> LON-CAPA-admin mailing list
> LON-CAPA-admin at mail.lon-capa.**org <LON-CAPA-admin at mail.lon-capa.org>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the LON-CAPA-admin