[LON-CAPA-dev] lonc connections dying

Mark Lucas lon-capa-dev@mail.lon-capa.org
Wed, 2 Jun 2010 23:11:56 -0400


--Apple-Mail-11-874772790
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

Hi,

I've been fighting lonc connections dying all quarter. With a change in =
textbook, we've
been using lots of problems from other domains in several of our =
courses.

We get the "Unable to find ......." error popping up a lot, and =
particularly a lot over
the last week.

I'm finally diving into this and will be checking out logs over the next =
couple days.

In the meantime, can anyone tell me what can cause a "DEAD" lonc =
connection?
I do a ps aux and find lonc DEAD for the offending connections. I also =
find some strange
error messages in /var/log/httpd/errors.

Right now, I get in and do a loncontrol reload when I find a dead =
connection. What would
happen if I just killed the dead lonc process - would it then try to =
restart?


Here are some samples:
from httpd/errors

[Wed Jun 02 22:05:11 2010] [error] access to =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem =
failed for 184.57.76.249, reason: Invali
d symb for =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem:=
 =
uploaded/ohiou/8j176084101734b7coucapa2/default_1236609481.sequence___19__=
_msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem
[Wed Jun 02 22:05:11 2010] [error] access to =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem =
failed for 184.57.76.249, reason: Invalid Access for zm216307 domain =
ohiou access bre[Wed Jun 02 22:05:17 2010] [error] [client =
132.235.42.74] Apache2::RequestIO::print: (103) Software caused =
connection abort at /home/httpd/lib/perl//Apache/lon
homework.pm line 1010, referer: =
http://capa10.phy.ohiou.edu/res/ohiou/serwaylib/Chap29/Radioisotope.proble=
m[Wed Jun 02 22:05:17 2010] [error] [client 132.235.42.74] =
Apache2::RequestIO::print: (103) Software caused connection abort at =
/home/httpd/lib/perl//Apache/lon
errorhandler.pm line 53, referer: =
http://capa10.phy.ohiou.edu/res/ohiou/serwaylib/Chap29/Radioisotope.proble=
m[Wed Jun 02 22:06:22 2010] [error] access to =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem =
failed for 184.57.76.249, reason: Invalid symb for =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem:=
 =
uploaded/ohiou/8j176084101734b7coucapa2/default_1236609481.sequence___19_
=
__msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem[Wed=
 Jun 02 22:06:22 2010] [error] access to =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem =
failed for 184.57.76.249, reason: Invali
d Access for zm216307 domain ohiou access bre


I also have a whole bunch of=20
Event: trapped error in `?? loncnew:444': Event 'Connection to lonc =
client 0': GLOB(0xc7eb030) isn't a valid IO at /home/httpd/perl/loncnew =
line 645

and a few=20

Event: trapped error in `Connection to lonc client 137': Event =
'Connection to lonc client 0': GLOB(0xc7eb030) isn't a valid IO at =
/home/httpd/perl/loncnew line 645

showing up in lonc_error, though there aren't time stamps here.


in lonc.log, this is the latest episode with s10 dropping out on capa10 =
(ohioua6)

Wed Jun  2 22:04:56 2010 (20029) [s10.lite.msu.edu] [Wed Jun  2 22:04:56 =
2010: s10.lite.msu.edu Connection count: 6 Retries remaining: 3 =
(insecure)] <font color=3D'blue'>WARNING: A socket timeout was =
detected</font>
Wed Jun  2 22:04:56 2010 (20029) [s10.lite.msu.edu] [Wed Jun  2 22:04:56 =
2010: s10.lite.msu.edu Connection count: 6 Retries remaining: 3 =
(insecure)] <font color=3D'blue'>WARNING: Failing transaction =
sethost</font>
Wed Jun  2 22:04:56 2010 (20029) [s10.lite.msu.edu] [Wed Jun  2 22:04:56 =
2010: s10.lite.msu.edu Connection count: 6 Retries remaining: 3 =
(insecure)] <font color=3D'blue'>WARNING: Shutting down a socket</font>
Wed Jun  2 22:04:56 2010 (20029) [s10.lite.msu.edu] [Wed Jun  2 22:04:56 =
2010: s10.lite.msu.edu Connection count: 5 Retries remaining: 2 =
(insecure)] <font color=3D'blue'>WARNING: Lond connection lost.</font>
font color=3D'blue'>WARNING: Shutting down a socket</font>
Wed Jun  2 22:05:11 2010 (20029) [s10.lite.msu.edu] [Wed Jun  2 22:05:11 =
2010: s10.lite.msu.edu Connection count: 5 Retries remaining: 1 =
(insecure)] <font color=3D'blue'>WARNING: A socket timeout was =
detected</font>
Wed Jun  2 22:05:11 2010 (20029) [s10.lite.msu.edu] [Wed Jun  2 22:05:11 =
2010: s10.lite.msu.edu Connection count: 5 Retries remaining: 1 =
(insecure)] <font color=3D'blue'>WARNING: Failing transaction =
sethost</font>
Wed Jun  2 22:05:11 2010 (20029) [s10.lite.msu.edu] [Wed Jun  2 22:05:11 =
2010: s10.lite.msu.edu Connection count: 5 Retries remaining: 1 =
(insecure)] <font color=3D'blue'>WARNING: Shutting down a socket</font>
Wed Jun  2 22:05:11 2010 (20029) [s10.lite.msu.edu] [Wed Jun  2 22:05:11 =
2010: s10.lite.msu.edu Connection count: 5 Retries remaining: 1 =
(insecure)] <font color=3D'red'>CRITICAL: Host marked DEAD: =
s10.lite.msu.edu</font>
Wed Jun  2 22:05:11 2010 (20029) [s10.lite.msu.edu] [Wed Jun  2 22:05:11 =
2010: s10.lite.msu.edu >> DEAD <<] <font color=3D'blue'>WARNING: Lond =
connection lost.</font>
Wed Jun  2 22:05:11 2010 (20029) [s10.lite.msu.edu] [Wed Jun  2 22:05:11 =
2010: s10.lite.msu.edu >> DEAD <<] <font color=3D'blue'>WARNING: =
Shutting down a socket</font>
Wed Jun  2 22:05:12 2010 (20029) [s10.lite.msu.edu] [Wed Jun  2 22:05:12 =
2010: s10.lite.msu.edu >> DEAD <<] <font color=3D'blue'>WARNING: A =
socket timeout was detected</font>

and then DEAD warnings every second until I reset things at 22:16:37

Any insights welcome.

Mark

--=20
Mark Lucas 								=
email: lucasm@ohiou.edu
252D Clippinger Lab						phone: =
(740)597-2984
Department of Physics and Astronomy		fax: (740)593-0433
Ohio University
Athens, OH 45701


--Apple-Mail-11-874772790
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=us-ascii

<html><head></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
">Hi,<div><br></div><div>I've been fighting lonc connections dying all =
quarter. With a change in textbook, we've</div><div>been using lots of =
problems from other domains in several of our =
courses.</div><div><br></div><div>We get the "Unable to find ......." =
error popping up a lot, and particularly a lot over</div><div>the last =
week.</div><div><br></div><div>I'm finally diving into this and will be =
checking out logs over the next couple days.</div><div><br></div><div>In =
the meantime, can anyone tell me what can cause a "DEAD" lonc =
connection?</div><div>I do a ps aux and find lonc DEAD for the offending =
connections. I also find some strange</div><div>error messages in =
/var/log/httpd/errors.</div><div><br></div><div>Right now, I get in and =
do a loncontrol reload when I find a dead connection. What =
would</div><div>happen if I just killed the dead lonc process - would it =
then try to restart?</div><div><br></div><div><br></div><div>Here are =
some samples:</div><div>from =
httpd/errors</div><div><br></div><div><div>[Wed Jun 02 22:05:11 2010] =
[error] access to =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem =
failed for 184.57.76.249, reason: Invali</div><div>d symb for =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem:=
 =
uploaded/ohiou/8j176084101734b7coucapa2/default_1236609481.sequence___19__=
_msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem</div=
><div>[Wed Jun 02 22:05:11 2010] [error] access to =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem =
failed for 184.57.76.249, reason: Invalid Access for zm216307 domain =
ohiou access bre[Wed Jun 02 22:05:17 2010] [error] [client =
132.235.42.74] Apache2::RequestIO::print: (103) Software caused =
connection abort at =
/home/httpd/lib/perl//Apache/lon</div><div>homework.pm line 1010, =
referer: <a =
href=3D"http://capa10.phy.ohiou.edu/res/ohiou/serwaylib/Chap29/Radioisotop=
e.problem[Wed">http://capa10.phy.ohiou.edu/res/ohiou/serwaylib/Chap29/Radi=
oisotope.problem[Wed</a> Jun 02 22:05:17 2010] [error] [client =
132.235.42.74] Apache2::RequestIO::print: (103) Software caused =
connection abort at =
/home/httpd/lib/perl//Apache/lon</div><div>errorhandler.pm line 53, =
referer: <a =
href=3D"http://capa10.phy.ohiou.edu/res/ohiou/serwaylib/Chap29/Radioisotop=
e.problem[Wed">http://capa10.phy.ohiou.edu/res/ohiou/serwaylib/Chap29/Radi=
oisotope.problem[Wed</a> Jun 02 22:06:22 2010] [error] access to =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem =
failed for 184.57.76.249, reason: Invalid symb for =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem:=
 =
uploaded/ohiou/8j176084101734b7coucapa2/default_1236609481.sequence___19_<=
/div><div>__msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.p=
roblem[Wed Jun 02 22:06:22 2010] [error] access to =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem =
failed for 184.57.76.249, reason: Invali</div><div>d Access for zm216307 =
domain ohiou access bre</div><div><br></div><div><br></div><div>I also =
have a whole bunch of&nbsp;</div><div><div>Event: trapped error in `?? =
loncnew:444': Event 'Connection to lonc client 0': GLOB(0xc7eb030) isn't =
a valid IO at /home/httpd/perl/loncnew line =
645</div><div><br></div><div>and a =
few&nbsp;</div><div><br></div><div><div>Event: trapped error in =
`Connection to lonc client 137': Event 'Connection to lonc client 0': =
GLOB(0xc7eb030) isn't a valid IO at /home/httpd/perl/loncnew line =
645</div><div><br></div></div><div>showing up in lonc_error, though =
there aren't time stamps =
here.</div><div><br></div><div><br></div><div>in lonc.log, this is the =
latest episode with s10 dropping out on capa10 =
(ohioua6)</div><div><br></div><div><div>Wed Jun &nbsp;2 22:04:56 2010 =
(20029) [<a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed =
Jun &nbsp;2 22:04:56 2010: <a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> Connection count: =
6 Retries remaining: 3 (insecure)] &lt;font color=3D'blue'&gt;WARNING: A =
socket timeout was detected&lt;/font&gt;</div><div>Wed Jun &nbsp;2 =
22:04:56 2010 (20029) [<a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed Jun &nbsp;2 =
22:04:56 2010: <a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> =
Connection count: 6 Retries remaining: 3 (insecure)] &lt;font =
color=3D'blue'&gt;WARNING: Failing transaction =
sethost&lt;/font&gt;</div><div>Wed Jun &nbsp;2 22:04:56 2010 (20029) [<a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed Jun &nbsp;2 =
22:04:56 2010: <a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> =
Connection count: 6 Retries remaining: 3 (insecure)] &lt;font =
color=3D'blue'&gt;WARNING: Shutting down a =
socket&lt;/font&gt;</div><div>Wed Jun &nbsp;2 22:04:56 2010 (20029) [<a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed Jun &nbsp;2 =
22:04:56 2010: <a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> =
Connection count: 5 Retries remaining: 2 (insecure)] &lt;font =
color=3D'blue'&gt;WARNING: Lond connection =
lost.&lt;/font&gt;</div><div>font color=3D'blue'&gt;WARNING: Shutting =
down a socket&lt;/font&gt;</div><div>Wed Jun &nbsp;2 22:05:11 2010 =
(20029) [<a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed =
Jun &nbsp;2 22:05:11 2010: <a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> Connection count: =
5 Retries remaining: 1 (insecure)] &lt;font color=3D'blue'&gt;WARNING: A =
socket timeout was detected&lt;/font&gt;</div><div>Wed Jun &nbsp;2 =
22:05:11 2010 (20029) [<a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed Jun &nbsp;2 =
22:05:11 2010: <a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> =
Connection count: 5 Retries remaining: 1 (insecure)] &lt;font =
color=3D'blue'&gt;WARNING: Failing transaction =
sethost&lt;/font&gt;</div><div>Wed Jun &nbsp;2 22:05:11 2010 (20029) [<a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed Jun &nbsp;2 =
22:05:11 2010: <a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> =
Connection count: 5 Retries remaining: 1 (insecure)] &lt;font =
color=3D'blue'&gt;WARNING: Shutting down a =
socket&lt;/font&gt;</div><div>Wed Jun &nbsp;2 22:05:11 2010 (20029) [<a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed Jun &nbsp;2 =
22:05:11 2010: <a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> =
Connection count: 5 Retries remaining: 1 (insecure)] &lt;font =
color=3D'red'&gt;CRITICAL: Host marked DEAD: =
s10.lite.msu.edu&lt;/font&gt;</div><div>Wed Jun &nbsp;2 22:05:11 2010 =
(20029) [<a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed =
Jun &nbsp;2 22:05:11 2010: <a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> &gt;&gt; DEAD =
&lt;&lt;] &lt;font color=3D'blue'&gt;WARNING: Lond connection =
lost.&lt;/font&gt;</div><div>Wed Jun &nbsp;2 22:05:11 2010 (20029) [<a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed Jun &nbsp;2 =
22:05:11 2010: <a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> =
&gt;&gt; DEAD &lt;&lt;] &lt;font color=3D'blue'&gt;WARNING: Shutting =
down a socket&lt;/font&gt;</div><div>Wed Jun &nbsp;2 22:05:12 2010 =
(20029) [<a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed =
Jun &nbsp;2 22:05:12 2010: <a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> &gt;&gt; DEAD =
&lt;&lt;] &lt;font color=3D'blue'&gt;WARNING: A socket timeout was =
detected&lt;/font&gt;</div><div><br></div><div>and then DEAD warnings =
every second until I reset things at =
22:16:37</div><div><br></div><div>Any insights =
welcome.</div><div><br></div><div>Mark</div><div><br></div></div></div><di=
v>
<div>--&nbsp;<br>Mark Lucas<span =
class=3D"Apple-converted-space">&nbsp;</span><span =
class=3D"Apple-tab-span" style=3D"white-space: pre; ">				=
				</span>email:&nbsp;<a =
href=3D"mailto:lucasm@ohiou.edu">lucasm@ohiou.edu</a><br>252D Clippinger =
Lab<span class=3D"Apple-tab-span" style=3D"white-space: pre; ">			=
			</span>phone: (740)597-2984<br>Department of =
Physics and Astronomy<span class=3D"Apple-tab-span" style=3D"white-space: =
pre; ">		</span>fax: (740)593-0433<br>Ohio University<br>Athens, =
OH 45701</div>
</div>
<br></div></body></html>=

--Apple-Mail-11-874772790--