[LON-CAPA-dev] lonc connections dying
Mark Lucas
lon-capa-dev@mail.lon-capa.org
Wed, 2 Jun 2010 23:11:56 -0400
--Apple-Mail-11-874772790
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=us-ascii
Hi,
I've been fighting lonc connections dying all quarter. With a change in =
textbook, we've
been using lots of problems from other domains in several of our =
courses.
We get the "Unable to find ......." error popping up a lot, and =
particularly a lot over
the last week.
I'm finally diving into this and will be checking out logs over the next =
couple days.
In the meantime, can anyone tell me what can cause a "DEAD" lonc =
connection?
I do a ps aux and find lonc DEAD for the offending connections. I also =
find some strange
error messages in /var/log/httpd/errors.
Right now, I get in and do a loncontrol reload when I find a dead =
connection. What would
happen if I just killed the dead lonc process - would it then try to =
restart?
Here are some samples:
from httpd/errors
[Wed Jun 02 22:05:11 2010] [error] access to =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem =
failed for 184.57.76.249, reason: Invali
d symb for =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem:=
=
uploaded/ohiou/8j176084101734b7coucapa2/default_1236609481.sequence___19__=
_msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem
[Wed Jun 02 22:05:11 2010] [error] access to =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem =
failed for 184.57.76.249, reason: Invalid Access for zm216307 domain =
ohiou access bre[Wed Jun 02 22:05:17 2010] [error] [client =
132.235.42.74] Apache2::RequestIO::print: (103) Software caused =
connection abort at /home/httpd/lib/perl//Apache/lon
homework.pm line 1010, referer: =
http://capa10.phy.ohiou.edu/res/ohiou/serwaylib/Chap29/Radioisotope.proble=
m[Wed Jun 02 22:05:17 2010] [error] [client 132.235.42.74] =
Apache2::RequestIO::print: (103) Software caused connection abort at =
/home/httpd/lib/perl//Apache/lon
errorhandler.pm line 53, referer: =
http://capa10.phy.ohiou.edu/res/ohiou/serwaylib/Chap29/Radioisotope.proble=
m[Wed Jun 02 22:06:22 2010] [error] access to =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem =
failed for 184.57.76.249, reason: Invalid symb for =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem:=
=
uploaded/ohiou/8j176084101734b7coucapa2/default_1236609481.sequence___19_
=
__msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem[Wed=
Jun 02 22:06:22 2010] [error] access to =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem =
failed for 184.57.76.249, reason: Invali
d Access for zm216307 domain ohiou access bre
I also have a whole bunch of=20
Event: trapped error in `?? loncnew:444': Event 'Connection to lonc =
client 0': GLOB(0xc7eb030) isn't a valid IO at /home/httpd/perl/loncnew =
line 645
and a few=20
Event: trapped error in `Connection to lonc client 137': Event =
'Connection to lonc client 0': GLOB(0xc7eb030) isn't a valid IO at =
/home/httpd/perl/loncnew line 645
showing up in lonc_error, though there aren't time stamps here.
in lonc.log, this is the latest episode with s10 dropping out on capa10 =
(ohioua6)
Wed Jun 2 22:04:56 2010 (20029) [s10.lite.msu.edu] [Wed Jun 2 22:04:56 =
2010: s10.lite.msu.edu Connection count: 6 Retries remaining: 3 =
(insecure)] <font color=3D'blue'>WARNING: A socket timeout was =
detected</font>
Wed Jun 2 22:04:56 2010 (20029) [s10.lite.msu.edu] [Wed Jun 2 22:04:56 =
2010: s10.lite.msu.edu Connection count: 6 Retries remaining: 3 =
(insecure)] <font color=3D'blue'>WARNING: Failing transaction =
sethost</font>
Wed Jun 2 22:04:56 2010 (20029) [s10.lite.msu.edu] [Wed Jun 2 22:04:56 =
2010: s10.lite.msu.edu Connection count: 6 Retries remaining: 3 =
(insecure)] <font color=3D'blue'>WARNING: Shutting down a socket</font>
Wed Jun 2 22:04:56 2010 (20029) [s10.lite.msu.edu] [Wed Jun 2 22:04:56 =
2010: s10.lite.msu.edu Connection count: 5 Retries remaining: 2 =
(insecure)] <font color=3D'blue'>WARNING: Lond connection lost.</font>
font color=3D'blue'>WARNING: Shutting down a socket</font>
Wed Jun 2 22:05:11 2010 (20029) [s10.lite.msu.edu] [Wed Jun 2 22:05:11 =
2010: s10.lite.msu.edu Connection count: 5 Retries remaining: 1 =
(insecure)] <font color=3D'blue'>WARNING: A socket timeout was =
detected</font>
Wed Jun 2 22:05:11 2010 (20029) [s10.lite.msu.edu] [Wed Jun 2 22:05:11 =
2010: s10.lite.msu.edu Connection count: 5 Retries remaining: 1 =
(insecure)] <font color=3D'blue'>WARNING: Failing transaction =
sethost</font>
Wed Jun 2 22:05:11 2010 (20029) [s10.lite.msu.edu] [Wed Jun 2 22:05:11 =
2010: s10.lite.msu.edu Connection count: 5 Retries remaining: 1 =
(insecure)] <font color=3D'blue'>WARNING: Shutting down a socket</font>
Wed Jun 2 22:05:11 2010 (20029) [s10.lite.msu.edu] [Wed Jun 2 22:05:11 =
2010: s10.lite.msu.edu Connection count: 5 Retries remaining: 1 =
(insecure)] <font color=3D'red'>CRITICAL: Host marked DEAD: =
s10.lite.msu.edu</font>
Wed Jun 2 22:05:11 2010 (20029) [s10.lite.msu.edu] [Wed Jun 2 22:05:11 =
2010: s10.lite.msu.edu >> DEAD <<] <font color=3D'blue'>WARNING: Lond =
connection lost.</font>
Wed Jun 2 22:05:11 2010 (20029) [s10.lite.msu.edu] [Wed Jun 2 22:05:11 =
2010: s10.lite.msu.edu >> DEAD <<] <font color=3D'blue'>WARNING: =
Shutting down a socket</font>
Wed Jun 2 22:05:12 2010 (20029) [s10.lite.msu.edu] [Wed Jun 2 22:05:12 =
2010: s10.lite.msu.edu >> DEAD <<] <font color=3D'blue'>WARNING: A =
socket timeout was detected</font>
and then DEAD warnings every second until I reset things at 22:16:37
Any insights welcome.
Mark
--=20
Mark Lucas =
email: lucasm@ohiou.edu
252D Clippinger Lab phone: =
(740)597-2984
Department of Physics and Astronomy fax: (740)593-0433
Ohio University
Athens, OH 45701
--Apple-Mail-11-874772790
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
charset=us-ascii
<html><head></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
">Hi,<div><br></div><div>I've been fighting lonc connections dying all =
quarter. With a change in textbook, we've</div><div>been using lots of =
problems from other domains in several of our =
courses.</div><div><br></div><div>We get the "Unable to find ......." =
error popping up a lot, and particularly a lot over</div><div>the last =
week.</div><div><br></div><div>I'm finally diving into this and will be =
checking out logs over the next couple days.</div><div><br></div><div>In =
the meantime, can anyone tell me what can cause a "DEAD" lonc =
connection?</div><div>I do a ps aux and find lonc DEAD for the offending =
connections. I also find some strange</div><div>error messages in =
/var/log/httpd/errors.</div><div><br></div><div>Right now, I get in and =
do a loncontrol reload when I find a dead connection. What =
would</div><div>happen if I just killed the dead lonc process - would it =
then try to restart?</div><div><br></div><div><br></div><div>Here are =
some samples:</div><div>from =
httpd/errors</div><div><br></div><div><div>[Wed Jun 02 22:05:11 2010] =
[error] access to =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem =
failed for 184.57.76.249, reason: Invali</div><div>d symb for =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem:=
=
uploaded/ohiou/8j176084101734b7coucapa2/default_1236609481.sequence___19__=
_msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem</div=
><div>[Wed Jun 02 22:05:11 2010] [error] access to =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem =
failed for 184.57.76.249, reason: Invalid Access for zm216307 domain =
ohiou access bre[Wed Jun 02 22:05:17 2010] [error] [client =
132.235.42.74] Apache2::RequestIO::print: (103) Software caused =
connection abort at =
/home/httpd/lib/perl//Apache/lon</div><div>homework.pm line 1010, =
referer: <a =
href=3D"http://capa10.phy.ohiou.edu/res/ohiou/serwaylib/Chap29/Radioisotop=
e.problem[Wed">http://capa10.phy.ohiou.edu/res/ohiou/serwaylib/Chap29/Radi=
oisotope.problem[Wed</a> Jun 02 22:05:17 2010] [error] [client =
132.235.42.74] Apache2::RequestIO::print: (103) Software caused =
connection abort at =
/home/httpd/lib/perl//Apache/lon</div><div>errorhandler.pm line 53, =
referer: <a =
href=3D"http://capa10.phy.ohiou.edu/res/ohiou/serwaylib/Chap29/Radioisotop=
e.problem[Wed">http://capa10.phy.ohiou.edu/res/ohiou/serwaylib/Chap29/Radi=
oisotope.problem[Wed</a> Jun 02 22:06:22 2010] [error] access to =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem =
failed for 184.57.76.249, reason: Invalid symb for =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem:=
=
uploaded/ohiou/8j176084101734b7coucapa2/default_1236609481.sequence___19_<=
/div><div>__msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.p=
roblem[Wed Jun 02 22:06:22 2010] [error] access to =
/res/msu/physicslib/msuphysicslib/70_CircAC2_LRC_Power/msuprob04b.problem =
failed for 184.57.76.249, reason: Invali</div><div>d Access for zm216307 =
domain ohiou access bre</div><div><br></div><div><br></div><div>I also =
have a whole bunch of </div><div><div>Event: trapped error in `?? =
loncnew:444': Event 'Connection to lonc client 0': GLOB(0xc7eb030) isn't =
a valid IO at /home/httpd/perl/loncnew line =
645</div><div><br></div><div>and a =
few </div><div><br></div><div><div>Event: trapped error in =
`Connection to lonc client 137': Event 'Connection to lonc client 0': =
GLOB(0xc7eb030) isn't a valid IO at /home/httpd/perl/loncnew line =
645</div><div><br></div></div><div>showing up in lonc_error, though =
there aren't time stamps =
here.</div><div><br></div><div><br></div><div>in lonc.log, this is the =
latest episode with s10 dropping out on capa10 =
(ohioua6)</div><div><br></div><div><div>Wed Jun 2 22:04:56 2010 =
(20029) [<a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed =
Jun 2 22:04:56 2010: <a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> Connection count: =
6 Retries remaining: 3 (insecure)] <font color=3D'blue'>WARNING: A =
socket timeout was detected</font></div><div>Wed Jun 2 =
22:04:56 2010 (20029) [<a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed Jun 2 =
22:04:56 2010: <a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> =
Connection count: 6 Retries remaining: 3 (insecure)] <font =
color=3D'blue'>WARNING: Failing transaction =
sethost</font></div><div>Wed Jun 2 22:04:56 2010 (20029) [<a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed Jun 2 =
22:04:56 2010: <a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> =
Connection count: 6 Retries remaining: 3 (insecure)] <font =
color=3D'blue'>WARNING: Shutting down a =
socket</font></div><div>Wed Jun 2 22:04:56 2010 (20029) [<a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed Jun 2 =
22:04:56 2010: <a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> =
Connection count: 5 Retries remaining: 2 (insecure)] <font =
color=3D'blue'>WARNING: Lond connection =
lost.</font></div><div>font color=3D'blue'>WARNING: Shutting =
down a socket</font></div><div>Wed Jun 2 22:05:11 2010 =
(20029) [<a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed =
Jun 2 22:05:11 2010: <a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> Connection count: =
5 Retries remaining: 1 (insecure)] <font color=3D'blue'>WARNING: A =
socket timeout was detected</font></div><div>Wed Jun 2 =
22:05:11 2010 (20029) [<a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed Jun 2 =
22:05:11 2010: <a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> =
Connection count: 5 Retries remaining: 1 (insecure)] <font =
color=3D'blue'>WARNING: Failing transaction =
sethost</font></div><div>Wed Jun 2 22:05:11 2010 (20029) [<a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed Jun 2 =
22:05:11 2010: <a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> =
Connection count: 5 Retries remaining: 1 (insecure)] <font =
color=3D'blue'>WARNING: Shutting down a =
socket</font></div><div>Wed Jun 2 22:05:11 2010 (20029) [<a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed Jun 2 =
22:05:11 2010: <a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> =
Connection count: 5 Retries remaining: 1 (insecure)] <font =
color=3D'red'>CRITICAL: Host marked DEAD: =
s10.lite.msu.edu</font></div><div>Wed Jun 2 22:05:11 2010 =
(20029) [<a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed =
Jun 2 22:05:11 2010: <a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> >> DEAD =
<<] <font color=3D'blue'>WARNING: Lond connection =
lost.</font></div><div>Wed Jun 2 22:05:11 2010 (20029) [<a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed Jun 2 =
22:05:11 2010: <a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> =
>> DEAD <<] <font color=3D'blue'>WARNING: Shutting =
down a socket</font></div><div>Wed Jun 2 22:05:12 2010 =
(20029) [<a href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a>] [Wed =
Jun 2 22:05:12 2010: <a =
href=3D"http://s10.lite.msu.edu">s10.lite.msu.edu</a> >> DEAD =
<<] <font color=3D'blue'>WARNING: A socket timeout was =
detected</font></div><div><br></div><div>and then DEAD warnings =
every second until I reset things at =
22:16:37</div><div><br></div><div>Any insights =
welcome.</div><div><br></div><div>Mark</div><div><br></div></div></div><di=
v>
<div>-- <br>Mark Lucas<span =
class=3D"Apple-converted-space"> </span><span =
class=3D"Apple-tab-span" style=3D"white-space: pre; "> =
</span>email: <a =
href=3D"mailto:lucasm@ohiou.edu">lucasm@ohiou.edu</a><br>252D Clippinger =
Lab<span class=3D"Apple-tab-span" style=3D"white-space: pre; "> =
</span>phone: (740)597-2984<br>Department of =
Physics and Astronomy<span class=3D"Apple-tab-span" style=3D"white-space: =
pre; "> </span>fax: (740)593-0433<br>Ohio University<br>Athens, =
OH 45701</div>
</div>
<br></div></body></html>=
--Apple-Mail-11-874772790--