<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">

<meta name="Generator" content="Microsoft Exchange Server">

<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>

</head>

<body>

<div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12.0pt; line-height:1.3; color:#385623">

<div>Excellent. Thank you.<br>

</div>

<div><br>

</div>

<div id="x_signature-x" class="x_signature_editor" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12.0pt; color:#385623">

<div>

<div>Mike B<br>

</div>

<br>

</div>

</div>

</div>

<div id="x_quoted_header" style="clear:both">

<hr style="border:none; height:1px; color:#E1E1E1; background-color:#E1E1E1">

<div style="border:none; padding:3.0pt 0cm 0cm 0cm"><span style="font-size:11.0pt; font-family:'Calibri','sans-serif'"><b>From:</b> Stuart Raeburn <raeburn@msu.edu><br>

<b>Sent:</b> Oct 18, 2016 5:38 PM<br>

<b>To:</b> lon-capa-admin@mail.lon-capa.org<br>

<b>Subject:</b> Re: [LON-CAPA-admin] high memory usage by long lived lonc processes<br>

</span></div>

</div>

<br type="attribution">

</div>

<font size="2"><span style="font-size:10pt;">

<div class="PlainText">Mike,<br>

<br>

><br>

> Is there a way to safely kill a lonc process without causing an   <br>

> error for the users?  I can't endlessly throw swap at it for the   <br>

> next month until I have an approved time for maintenance.<br>

><br>

<br>

You can kill those lonc processes.<br>

<br>

Since your library node is also a load-balancer then whenever a log-in  <br>

occurs, the access servers will be contacted to determine current load  <br>

so that sessions can be sent to the one with the lowest load. Making a  <br>

request for load information will require a lonc connection to the  <br>

access server.<br>

<br>

After you have killed the long-running lonc process for a particular  <br>

access node, the parent lonc process on the library node should spawn  <br>

a new child lonc connection to the access node when the next user  <br>

logs-in.<br>

<br>

However, if that doesn't work out you could kill the parent lonc  <br>

process too, and then do: /etc/init.d/loncontrol start to start a new  <br>

lonc parent etc.<br>

<br>

><br>

> Could there be a memory leak hidden by the 5 minute idle timeout in   <br>

> lonc?  Since we almost never go 5 minutes of idle lonc, are we   <br>

> hitting it?<br>

><br>

<br>

The item repeated many times in lonc_errors:<br>

<br>

><br>

> There are a LOT of lines like this:<br>

> Event: trapped error in `?? loncnew:444': Event 'Connection to lonc   <br>

> client 0': GLOB(0x1c51740) isn't a valid IO at   <br>

> /home/httpd/perl/loncnew line 647<br>

><br>

<br>

suggests that the lonc process got into an error state from which it  <br>

has not recovered cleanly. I suspect that post-error state is the  <br>

reason why memory usage keeps climbing.  Killing the lonc process  <br>

would be a good solution.<br>

<br>

Given the frequency of log-ins to the msu load-balancer node from 9000  <br>

LON-CAOA users at MSU, and also the fact that a monitoring service  <br>

completes a LON-CAPA log-in to that node every 5 minutes, it seems  <br>

likely that the 5 minute idle timeout is not encountered on the msu  <br>

load-balancer very frequently either.<br>

<br>

For that node,<br>

<br>

top -cbn1 -u www | grep lonc |grep msu<br>

<br>

reports:<br>

<br>

184m 10m 3172 S  0.0  0.3   0:30.16 lonc: s1.lite.msu.edu Connection  <br>

count: 1 Retries remaining: 5 (ssl) Tue Oct 18 16:33:08 2016<br>

<br>

185m 10m 3172 S  0.0  0.3   0:31.48 lonc: s2.lite.msu.edu Connection  <br>

count: 1 Retries remaining: 5 (ssl) Tue Oct 18 16:33:09 2016<br>

<br>

184m  10m 3172 S  0.0  0.3   0:04.91 lonc: s3.lite.msu.edu Connection  <br>

count: 2 Retries remaining: 5 (ssl) Tue Oct 18 16:33:08 2016<br>

<br>

184m  10m 3172 S  0.0  0.3   0:28.85 lonc: s4.lite.msu.edu Connection  <br>

count: 1 Retries remaining: 5 (ssl) Tue Oct 18 16:33:08 2016<br>

<br>

<br>

Stuart Raeburn<br>

LON-CAPA Academic Consortium<br>

<br>

Quoting "Budzik, Michael J." <mikeb@purdue.edu>:<br>

<br>

> Yes, purduel1 is our library node and load balancer node.  That's   <br>

> where that top output was from.<br>

><br>

>> unlike the RES value, the VIRT value is dependent on the Linux   <br>

>> distro -- it's much lower for CentOS 5 than for<br>

>> CentOS 6 or 7, even though RES is about the same for all).<br>

><br>

> RES only includes what is in physical memory. VIRT does include the   <br>

> size of shared libraries, but, more significantly for this issue, it  <br>

>  also includes swap used by that process.  You can see that we are   <br>

> using about 1GB of SWAP for each of the lonc processes in question:<br>

><br>

> for file in /proc/*/status ; do awk '/VmSwap|Name/{printf $2 " "   <br>

> $3}END{ print ""}' $file; done | sort -k 2 -n -r | less<br>

> loncnew 1065732 kB<br>

> loncnew 1059156 kB<br>

> loncnew 1056384 kB<br>

> loncnew 1050452 kB<br>

> loncnew 1049516 kB<br>

><br>

> Is there a way to safely kill a lonc process without causing an   <br>

> error for the users?  I can't endlessly throw swap at it for the   <br>

> next month until I have an approved time for maintenance.<br>

><br>

> Could there be a memory leak hidden by the 5 minute idle timeout in   <br>

> lonc?  Since we almost never go 5 minutes of idle lonc, are we   <br>

> hitting it?<br>

><br>

>> Do you find anything meaningful in   <br>

>> /home/httpd/perl/logs/lonc_errors on your library server?<br>

><br>

> There are a LOT of lines like this:<br>

> Event: trapped error in `?? loncnew:444': Event 'Connection to lonc   <br>

> client 0': GLOB(0x1c51740) isn't a valid IO at   <br>

> /home/httpd/perl/loncnew line 647<br>

><br>

> There are several handfuls of lines like this:<br>

> Event: trapped error in `Connection to lonc client 194': Event   <br>

> 'Connection to lonc client 0': GLOB(0x1c77c50) isn't a valid IO at   <br>

> /home/httpd/perl/loncnew line 647<br>

><br>

> Those all seem to be in 2 or 3 clusters sort of near the top of the   <br>

> log. There are no timestamps, so I'm not sure of those are related   <br>

> to something like the 05:10 daily reloads.<br>

><br>

> There are a few lines like this:<br>

> Event: trapped error in `Connection to lonc client 14': Can't locate  <br>

>  object method "Shutdown" via package   <br>

> "LondConnection=HASH(0x1c50ca8)" (perhaps you forgot to load   <br>

> "LondConnection=HASH(0x1c50ca8)"?) at /home/httpd/perl/loncnew line   <br>

> 754.<br>

><br>

> Thanks!<br>

> Mike B<br>

><br>

><br>

> -----Original Message-----<br>

> From: lon-capa-admin-bounces@mail.lon-capa.org   <br>

> [<a href="mailto:lon-capa-admin-bounces@mail.lon-capa.org">mailto:lon-capa-admin-bounces@mail.lon-capa.org</a>] On Behalf Of  

<br>

> Stuart Raeburn<br>

> Sent: Tuesday, October 18, 2016 10:50 AM<br>

> To: lon-capa-admin@mail.lon-capa.org<br>

> Subject: Re: [LON-CAPA-admin] high memory usage by long lived lonc processes<br>

><br>

> Mike,<br>

><br>

>><br>

>> Are we the only ones seeing lonc processes last well over 20 days<br>

>> and continue to allocate more and more RAM?<br>

>><br>

><br>

> Yes, I suspect you may be the only one.<br>

><br>

> I'm not seeing long-lived lonc processes with high memory values   <br>

> reported for VIRT or RES in top on any of the LON-CAPA instances I   <br>

> manage (msu.edu, educog.com, loncapa.net).<br>

><br>

> The RES (Resident memeory) value is the one I am typically concerned  <br>

>  about, and that is around 10 MB for each lonc process.  (I also see  <br>

>  around 185 MB for VIRT for each lonc process, but unlike the RES   <br>

> value, the VIRT value is dependent on the Linux distro -- it's much   <br>

> lower for CentOS 5 than for CentOS 6 or 7, even though RES is about   <br>

> the same for all).<br>

><br>

> In any case, the RES values are also anomalous for the lonc   <br>

> processes for connections to your access servers (250 MB instead of   <br>

> 9 MB).<br>

><br>

> In the msu domain I expect to consistently see lonc connections   <br>

> between the LON-CAPA load balancer server and the access servers,   <br>

> when I check top, but on the library server I typically expect to   <br>

> see a lond connection to each access server.<br>

><br>

> If the top output is for your library server, is purduel1 also   <br>

> configured as a LON-CAPA load balancer?<br>

><br>

> If not, then you'd typically only see lonc connections initiated to   <br>

> your access servers when published resources are republished on the   <br>

> library server, and an "update" notification is sent to each access   <br>

> server which is subscribed to the resource.<br>

><br>

> Do you find anything meaningful in /home/httpd/perl/logs/lonc_errors  <br>

>  on your library server?<br>

><br>

><br>

> Stuart Raeburn<br>

> LON-CAPA Academic Consortium<br>

><br>

> Quoting "Budzik, Michael J." <mikeb@purdue.edu>:<br>

><br>

>> Are we the only ones seeing lonc processes last well over 20 days<br>

>> and continue to allocate more and more RAM?  We now have 5 lonc<br>

>> processes that are each using over 1.5GB RAM.<br>

>> Mike B<br>

>><br>

>> From: lon-capa-admin-bounces@mail.lon-capa.org<br>

>> [<a href="mailto:lon-capa-admin-bounces@mail.lon-capa.org">mailto:lon-capa-admin-bounces@mail.lon-capa.org</a>] On Behalf Of<br>

>> Budzik, Michael J.<br>

>> Sent: Friday, October 14, 2016 1:04 PM<br>

>> To: 'lon-capa-admin@mail.lon-capa.org'<br>

>> <lon-capa-admin@mail.lon-capa.org><br>

>> Subject: [LON-CAPA-admin] high memory usage by long lived lonc<br>

>> processes<br>

>><br>

>><br>

>> Our lonc processes that live a long time end up using a lot of RAM.<br>

>>  Here are a few rows of output from top.  Check out the lonc<br>

>> processes in the middle of the list that are each using 1.3GB ram<br>

>> compared to the others that are around 180 MB.<br>

>><br>

>><br>

>><br>

>> # top -cbn1 -u www | grep lonc<br>

>><br>

>> 5058 www       20   0  184m 7604 1176 S  0.0  0.1   0:01.29 lonc:<br>

>> capa9.phy.ohio.edu Connection count: 0 Retries remaining: 5 () Fri<br>

>> Oct 14 11:35:09 2016<br>

>><br>

>>  5103 www       20   0  184m 7564 1176 S  0.0  0.1   0:00.95 lonc:<br>

>> meitner.physics.hope.edu Connection count: 0 Retries remaining: 5 ()<br>

>> Fri Oct 14 11:35:09 2016<br>

>><br>

>> 18053 www       20   0  180m 7384  920 S  0.0  0.1   0:10.15 lonc:<br>

>> Parent keeping the flock Fri Oct 14 12:46:50 2016<br>

>><br>

>> 18063 www       20   0 1321m 251m 1224 S  0.0  3.2  20:24.92 lonc:<br>

>> loncapa02.purdue.edu Connection count: 2 Retries remaining: 5<br>

>> (insecure) Fri Oct 14 12:49:42 2016<br>

>><br>

>> 18067 www       20   0 1321m 250m 1224 S  0.0  3.2  21:45.86 lonc:<br>

>> loncapa05.purdue.edu Connection count: 2 Retries remaining: 5<br>

>> (insecure) Fri Oct 14 12:49:41 2016<br>

>><br>

>> 21139 www       20   0 1321m 250m 1224 S  0.0  3.2  21:57.04 lonc:<br>

>> loncapa07.purdue.edu Connection count: 2 Retries remaining: 5<br>

>> (insecure) Fri Oct 14 12:49:41 2016<br>

>><br>

>> 21150 www       20   0 1321m 248m 1224 S  0.0  3.2  22:11.91 lonc:<br>

>> loncapa04.purdue.edu Connection count: 2 Retries remaining: 5<br>

>> (insecure) Fri Oct 14 12:49:42 2016<br>

>><br>

>> 21151 www       20   0 1321m 253m 1224 S  0.0  3.2  21:48.87 lonc:<br>

>> loncapa06.purdue.edu Connection count: 2 Retries remaining: 5<br>

>> (insecure) Fri Oct 14 12:49:42 2016<br>

>><br>

>> 22900 www       20   0  182m 8756 1972 S  0.0  0.1   0:00.93 lonc:<br>

>> loncapa03.purdue.edu Connection count: 1 Retries remaining: 5<br>

>> (insecure) Fri Oct 14 12:49:41 2016<br>

>><br>

>> 29226 www       20   0  184m 8900 2060 S  0.0  0.1   0:00.04 lonc:<br>

>> capa4.phy.ohio.edu Connection count: 3 Retries remaining: 5<br>

>> (insecure) Fri Oct 14 12:49:42 2016<br>

>><br>

>> 29419 www       20   0  182m 8776 1972 S  0.0  0.1   0:00.11 lonc:<br>

>> loncapa.purdue.edu Connection count: 1 Retries remaining: 5<br>

>> (insecure) Fri Oct 14 12:49:41 2016<br>

>><br>

>><br>

>><br>

>><br>

>><br>

>> Anyone else see this?<br>

>><br>

>><br>

>><br>

>> Thanks,<br>

>><br>

>> Mike Budzik<br>

>><br>

>> Interim Manager, Student Systems and Web Services Admin<br>

>><br>

>> IT Infrastructure - Purdue University<br>

<br>

_______________________________________________<br>

LON-CAPA-admin mailing list<br>

LON-CAPA-admin@mail.lon-capa.org<br>

<a href="http://mail.lon-capa.org/mailman/listinfo/lon-capa-admin">http://mail.lon-capa.org/mailman/listinfo/lon-capa-admin</a><br>

</div>

</span></font>

</body>

</html>