<div dir="ltr"><div><div><div><div>Mike,<br><br>I've been following this thread because I do get entries in lonc_errors similar to those that you posted but I don't get large memory usage on the long lived processes, they remain about 10MB, like Stuart's. I will see these lonc_error entries about once or twice a month on different servers. I've always associate them with the server being really busy but this morning there were fresh entries in lonc_errors for one of my servers and it wasn't that busy yesterday. The server's worst load was:<br><br>Active=16<br>Moderately Active=76<br>Inactive=191<br>loadavg=0.42 0.21 0.16 1/487 57695<br><br></div><div></div>Looking the the log entries, I noticed that the "client number" in subsequent log entries increases. I don't know if that indicates anything useful in figuring out which process (the parent?) is having trouble. Here is my log:<br><br>Event: trapped error in `Connection to lonc client 5': Can't locate object method "Shutdown" via package "LondConnection=HASH(0x27ba858)" (perhaps you forgot to load "LondConnection=HASH(<br>0x27ba858)"?) at /home/httpd/perl/loncnew line 754.<br>Event: trapped error in `Connection to lonc client 6': Can't locate object method "Shutdown" via package "LondConnection=HASH(0x27ba858)" (perhaps you forgot to load "LondConnection=HASH(<br>0x27ba858)"?) at /home/httpd/perl/loncnew line 754.<br>Event: trapped error in `Connection to lonc client 7': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `Connection to lonc client 8': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `Connection to lonc client 9': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `Connection to lonc client 10': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `?? loncnew:444': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `?? loncnew:444': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `?? loncnew:444': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `?? loncnew:444': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `?? loncnew:444': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `?? loncnew:444': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `?? loncnew:444': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `?? loncnew:444': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `?? loncnew:444': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `?? loncnew:444': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `?? loncnew:444': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `?? loncnew:444': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `?? loncnew:444': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `?? loncnew:444': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `?? loncnew:444': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `?? loncnew:444': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `?? loncnew:444': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `?? loncnew:444': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `?? loncnew:444': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `?? loncnew:444': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `?? loncnew:444': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `Connection to lonc client 11': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br>Event: trapped error in `Connection to lonc client 12': Event 'Connection to lonc client 0': GLOB(0x27bcfc8) isn't a valid IO at /home/httpd/perl/loncnew line 647.<br><br></div>Bob Gonzales<br></div>Chemistry Dept<br></div>Binghamton University<br><div><div><div><div><div><br></div></div></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Oct 19, 2016 at 1:46 PM, Budzik, Michael J. <span dir="ltr"><<a href="mailto:mikeb@purdue.edu" target="_blank">mikeb@purdue.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Stuart,<br>
We killed the lonc processes that were behaving unexpectedly - just the individual ones, not the parent "flock keeper". The replacement ones spawned as expected and now the log looks good.<br>
Thanks for your help with this.<br>
<span class="im HOEnZb"><br>
Mike B<br>
<br>
<br>
-----Original Message-----<br>
From: <a href="mailto:lon-capa-admin-bounces@mail.lon-capa.org">lon-capa-admin-bounces@mail.<wbr>lon-capa.org</a> [mailto:<a href="mailto:lon-capa-admin-bounces@mail.lon-capa.org">lon-capa-admin-<wbr>bounces@mail.lon-capa.org</a>] On Behalf Of Stuart Raeburn<br>
</span><div class="HOEnZb"><div class="h5">Sent: Tuesday, October 18, 2016 5:22 PM<br>
To: <a href="mailto:lon-capa-admin@mail.lon-capa.org">lon-capa-admin@mail.lon-capa.<wbr>org</a><br>
Subject: Re: [LON-CAPA-admin] high memory usage by long lived lonc processes<br>
<br>
Mike,<br>
<br>
><br>
> Is there a way to safely kill a lonc process without causing an<br>
> error for the users? I can't endlessly throw swap at it for the<br>
> next month until I have an approved time for maintenance.<br>
><br>
<br>
You can kill those lonc processes.<br>
<br>
Since your library node is also a load-balancer then whenever a log-in occurs, the access servers will be contacted to determine current load so that sessions can be sent to the one with the lowest load. Making a request for load information will require a lonc connection to the access server.<br>
<br>
After you have killed the long-running lonc process for a particular access node, the parent lonc process on the library node should spawn a new child lonc connection to the access node when the next user logs-in.<br>
<br>
However, if that doesn't work out you could kill the parent lonc process too, and then do: /etc/init.d/loncontrol start to start a new lonc parent etc.<br>
<br>
><br>
> Could there be a memory leak hidden by the 5 minute idle timeout in<br>
> lonc? Since we almost never go 5 minutes of idle lonc, are we<br>
> hitting it?<br>
><br>
<br>
The item repeated many times in lonc_errors:<br>
<br>
><br>
> There are a LOT of lines like this:<br>
> Event: trapped error in `?? loncnew:444': Event 'Connection to lonc<br>
> client 0': GLOB(0x1c51740) isn't a valid IO at<br>
> /home/httpd/perl/loncnew line 647<br>
><br>
<br>
suggests that the lonc process got into an error state from which it has not recovered cleanly. I suspect that post-error state is the reason why memory usage keeps climbing. Killing the lonc process would be a good solution.<br>
<br>
Given the frequency of log-ins to the msu load-balancer node from 9000 LON-CAOA users at MSU, and also the fact that a monitoring service completes a LON-CAPA log-in to that node every 5 minutes, it seems likely that the 5 minute idle timeout is not encountered on the msu load-balancer very frequently either.<br>
<br>
For that node,<br>
<br>
top -cbn1 -u www | grep lonc |grep msu<br>
<br>
reports:<br>
<br>
184m 10m 3172 S 0.0 0.3 0:30.16 lonc: <a href="http://s1.lite.msu.edu" rel="noreferrer" target="_blank">s1.lite.msu.edu</a> Connection<br>
count: 1 Retries remaining: 5 (ssl) Tue Oct 18 16:33:08 2016<br>
<br>
185m 10m 3172 S 0.0 0.3 0:31.48 lonc: <a href="http://s2.lite.msu.edu" rel="noreferrer" target="_blank">s2.lite.msu.edu</a> Connection<br>
count: 1 Retries remaining: 5 (ssl) Tue Oct 18 16:33:09 2016<br>
<br>
184m 10m 3172 S 0.0 0.3 0:04.91 lonc: <a href="http://s3.lite.msu.edu" rel="noreferrer" target="_blank">s3.lite.msu.edu</a> Connection<br>
count: 2 Retries remaining: 5 (ssl) Tue Oct 18 16:33:08 2016<br>
<br>
184m 10m 3172 S 0.0 0.3 0:28.85 lonc: <a href="http://s4.lite.msu.edu" rel="noreferrer" target="_blank">s4.lite.msu.edu</a> Connection<br>
count: 1 Retries remaining: 5 (ssl) Tue Oct 18 16:33:08 2016<br>
<br>
<br>
Stuart Raeburn<br>
LON-CAPA Academic Consortium<br>
<br>
Quoting "Budzik, Michael J." <<a href="mailto:mikeb@purdue.edu">mikeb@purdue.edu</a>>:<br>
<br>
> Yes, purduel1 is our library node and load balancer node. That's<br>
> where that top output was from.<br>
><br>
>> unlike the RES value, the VIRT value is dependent on the Linux<br>
>> distro -- it's much lower for CentOS 5 than for CentOS 6 or 7, even<br>
>> though RES is about the same for all).<br>
><br>
> RES only includes what is in physical memory. VIRT does include the<br>
> size of shared libraries, but, more significantly for this issue, it<br>
> also includes swap used by that process. You can see that we are<br>
> using about 1GB of SWAP for each of the lonc processes in question:<br>
><br>
> for file in /proc/*/status ; do awk '/VmSwap|Name/{printf $2 " "<br>
> $3}END{ print ""}' $file; done | sort -k 2 -n -r | less loncnew<br>
> 1065732 kB loncnew 1059156 kB loncnew 1056384 kB loncnew 1050452 kB<br>
> loncnew 1049516 kB<br>
><br>
> Is there a way to safely kill a lonc process without causing an<br>
> error for the users? I can't endlessly throw swap at it for the<br>
> next month until I have an approved time for maintenance.<br>
><br>
> Could there be a memory leak hidden by the 5 minute idle timeout in<br>
> lonc? Since we almost never go 5 minutes of idle lonc, are we<br>
> hitting it?<br>
><br>
>> Do you find anything meaningful in<br>
>> /home/httpd/perl/logs/lonc_<wbr>errors on your library server?<br>
><br>
> There are a LOT of lines like this:<br>
> Event: trapped error in `?? loncnew:444': Event 'Connection to lonc<br>
> client 0': GLOB(0x1c51740) isn't a valid IO at<br>
> /home/httpd/perl/loncnew line 647<br>
><br>
> There are several handfuls of lines like this:<br>
> Event: trapped error in `Connection to lonc client 194': Event<br>
> 'Connection to lonc client 0': GLOB(0x1c77c50) isn't a valid IO at<br>
> /home/httpd/perl/loncnew line 647<br>
><br>
> Those all seem to be in 2 or 3 clusters sort of near the top of the<br>
> log. There are no timestamps, so I'm not sure of those are related<br>
> to something like the 05:10 daily reloads.<br>
><br>
> There are a few lines like this:<br>
> Event: trapped error in `Connection to lonc client 14': Can't locate<br>
> object method "Shutdown" via package<br>
> "LondConnection=HASH(<wbr>0x1c50ca8)" (perhaps you forgot to load<br>
> "LondConnection=HASH(<wbr>0x1c50ca8)"?) at /home/httpd/perl/loncnew line<br>
> 754.<br>
><br>
> Thanks!<br>
> Mike B<br>
><br>
><br>
> -----Original Message-----<br>
> From: <a href="mailto:lon-capa-admin-bounces@mail.lon-capa.org">lon-capa-admin-bounces@mail.<wbr>lon-capa.org</a><br>
> [mailto:<a href="mailto:lon-capa-admin-bounces@mail.lon-capa.org">lon-capa-admin-<wbr>bounces@mail.lon-capa.org</a>] On Behalf Of<br>
> Stuart Raeburn<br>
> Sent: Tuesday, October 18, 2016 10:50 AM<br>
> To: <a href="mailto:lon-capa-admin@mail.lon-capa.org">lon-capa-admin@mail.lon-capa.<wbr>org</a><br>
> Subject: Re: [LON-CAPA-admin] high memory usage by long lived lonc<br>
> processes<br>
><br>
> Mike,<br>
><br>
>><br>
>> Are we the only ones seeing lonc processes last well over 20 days and<br>
>> continue to allocate more and more RAM?<br>
>><br>
><br>
> Yes, I suspect you may be the only one.<br>
><br>
> I'm not seeing long-lived lonc processes with high memory values<br>
> reported for VIRT or RES in top on any of the LON-CAPA instances I<br>
> manage (<a href="http://msu.edu" rel="noreferrer" target="_blank">msu.edu</a>, <a href="http://educog.com" rel="noreferrer" target="_blank">educog.com</a>, <a href="http://loncapa.net" rel="noreferrer" target="_blank">loncapa.net</a>).<br>
><br>
> The RES (Resident memeory) value is the one I am typically concerned<br>
> about, and that is around 10 MB for each lonc process. (I also see<br>
> around 185 MB for VIRT for each lonc process, but unlike the RES<br>
> value, the VIRT value is dependent on the Linux distro -- it's much<br>
> lower for CentOS 5 than for CentOS 6 or 7, even though RES is about<br>
> the same for all).<br>
><br>
> In any case, the RES values are also anomalous for the lonc<br>
> processes for connections to your access servers (250 MB instead of<br>
> 9 MB).<br>
><br>
> In the msu domain I expect to consistently see lonc connections<br>
> between the LON-CAPA load balancer server and the access servers,<br>
> when I check top, but on the library server I typically expect to<br>
> see a lond connection to each access server.<br>
><br>
> If the top output is for your library server, is purduel1 also<br>
> configured as a LON-CAPA load balancer?<br>
><br>
> If not, then you'd typically only see lonc connections initiated to<br>
> your access servers when published resources are republished on the<br>
> library server, and an "update" notification is sent to each access<br>
> server which is subscribed to the resource.<br>
><br>
> Do you find anything meaningful in /home/httpd/perl/logs/lonc_<wbr>errors<br>
> on your library server?<br>
><br>
><br>
> Stuart Raeburn<br>
> LON-CAPA Academic Consortium<br>
><br>
> Quoting "Budzik, Michael J." <<a href="mailto:mikeb@purdue.edu">mikeb@purdue.edu</a>>:<br>
><br>
>> Are we the only ones seeing lonc processes last well over 20 days and<br>
>> continue to allocate more and more RAM? We now have 5 lonc processes<br>
>> that are each using over 1.5GB RAM.<br>
>> Mike B<br>
>><br>
>> From: <a href="mailto:lon-capa-admin-bounces@mail.lon-capa.org">lon-capa-admin-bounces@mail.<wbr>lon-capa.org</a><br>
>> [mailto:<a href="mailto:lon-capa-admin-bounces@mail.lon-capa.org">lon-capa-admin-<wbr>bounces@mail.lon-capa.org</a>] On Behalf Of<br>
>> Budzik, Michael J.<br>
>> Sent: Friday, October 14, 2016 1:04 PM<br>
>> To: '<a href="mailto:lon-capa-admin@mail.lon-capa.org">lon-capa-admin@mail.lon-capa.<wbr>org</a>'<br>
>> <<a href="mailto:lon-capa-admin@mail.lon-capa.org">lon-capa-admin@mail.lon-capa.<wbr>org</a>><br>
>> Subject: [LON-CAPA-admin] high memory usage by long lived lonc<br>
>> processes<br>
>><br>
>><br>
>> Our lonc processes that live a long time end up using a lot of RAM.<br>
>> Here are a few rows of output from top. Check out the lonc<br>
>> processes in the middle of the list that are each using 1.3GB ram<br>
>> compared to the others that are around 180 MB.<br>
>><br>
>><br>
>><br>
>> # top -cbn1 -u www | grep lonc<br>
>><br>
>> 5058 www 20 0 184m 7604 1176 S 0.0 0.1 0:01.29 lonc:<br>
>> <a href="http://capa9.phy.ohio.edu" rel="noreferrer" target="_blank">capa9.phy.ohio.edu</a> Connection count: 0 Retries remaining: 5 () Fri<br>
>> Oct 14 11:35:09 2016<br>
>><br>
>> 5103 www 20 0 184m 7564 1176 S 0.0 0.1 0:00.95 lonc:<br>
>> <a href="http://meitner.physics.hope.edu" rel="noreferrer" target="_blank">meitner.physics.hope.edu</a> Connection count: 0 Retries remaining: 5 ()<br>
>> Fri Oct 14 11:35:09 2016<br>
>><br>
>> 18053 www 20 0 180m 7384 920 S 0.0 0.1 0:10.15 lonc:<br>
>> Parent keeping the flock Fri Oct 14 12:46:50 2016<br>
>><br>
>> 18063 www 20 0 1321m 251m 1224 S 0.0 3.2 20:24.92 lonc:<br>
>> <a href="http://loncapa02.purdue.edu" rel="noreferrer" target="_blank">loncapa02.purdue.edu</a> Connection count: 2 Retries remaining: 5<br>
>> (insecure) Fri Oct 14 12:49:42 2016<br>
>><br>
>> 18067 www 20 0 1321m 250m 1224 S 0.0 3.2 21:45.86 lonc:<br>
>> <a href="http://loncapa05.purdue.edu" rel="noreferrer" target="_blank">loncapa05.purdue.edu</a> Connection count: 2 Retries remaining: 5<br>
>> (insecure) Fri Oct 14 12:49:41 2016<br>
>><br>
>> 21139 www 20 0 1321m 250m 1224 S 0.0 3.2 21:57.04 lonc:<br>
>> <a href="http://loncapa07.purdue.edu" rel="noreferrer" target="_blank">loncapa07.purdue.edu</a> Connection count: 2 Retries remaining: 5<br>
>> (insecure) Fri Oct 14 12:49:41 2016<br>
>><br>
>> 21150 www 20 0 1321m 248m 1224 S 0.0 3.2 22:11.91 lonc:<br>
>> <a href="http://loncapa04.purdue.edu" rel="noreferrer" target="_blank">loncapa04.purdue.edu</a> Connection count: 2 Retries remaining: 5<br>
>> (insecure) Fri Oct 14 12:49:42 2016<br>
>><br>
>> 21151 www 20 0 1321m 253m 1224 S 0.0 3.2 21:48.87 lonc:<br>
>> <a href="http://loncapa06.purdue.edu" rel="noreferrer" target="_blank">loncapa06.purdue.edu</a> Connection count: 2 Retries remaining: 5<br>
>> (insecure) Fri Oct 14 12:49:42 2016<br>
>><br>
>> 22900 www 20 0 182m 8756 1972 S 0.0 0.1 0:00.93 lonc:<br>
>> <a href="http://loncapa03.purdue.edu" rel="noreferrer" target="_blank">loncapa03.purdue.edu</a> Connection count: 1 Retries remaining: 5<br>
>> (insecure) Fri Oct 14 12:49:41 2016<br>
>><br>
>> 29226 www 20 0 184m 8900 2060 S 0.0 0.1 0:00.04 lonc:<br>
>> <a href="http://capa4.phy.ohio.edu" rel="noreferrer" target="_blank">capa4.phy.ohio.edu</a> Connection count: 3 Retries remaining: 5<br>
>> (insecure) Fri Oct 14 12:49:42 2016<br>
>><br>
>> 29419 www 20 0 182m 8776 1972 S 0.0 0.1 0:00.11 lonc:<br>
>> <a href="http://loncapa.purdue.edu" rel="noreferrer" target="_blank">loncapa.purdue.edu</a> Connection count: 1 Retries remaining: 5<br>
>> (insecure) Fri Oct 14 12:49:41 2016<br>
>><br>
>><br>
>><br>
>><br>
>><br>
>> Anyone else see this?<br>
>><br>
>><br>
>><br>
>> Thanks,<br>
>><br>
>> Mike Budzik<br>
>><br>
>> Interim Manager, Student Systems and Web Services Admin<br>
>><br>
>> IT Infrastructure - Purdue University<br>
<br>
______________________________<wbr>_________________<br>
LON-CAPA-admin mailing list<br>
<a href="mailto:LON-CAPA-admin@mail.lon-capa.org">LON-CAPA-admin@mail.lon-capa.<wbr>org</a><br>
<a href="http://mail.lon-capa.org/mailman/listinfo/lon-capa-admin" rel="noreferrer" target="_blank">http://mail.lon-capa.org/<wbr>mailman/listinfo/lon-capa-<wbr>admin</a><br>
______________________________<wbr>_________________<br>
LON-CAPA-admin mailing list<br>
<a href="mailto:LON-CAPA-admin@mail.lon-capa.org">LON-CAPA-admin@mail.lon-capa.<wbr>org</a><br>
<a href="http://mail.lon-capa.org/mailman/listinfo/lon-capa-admin" rel="noreferrer" target="_blank">http://mail.lon-capa.org/<wbr>mailman/listinfo/lon-capa-<wbr>admin</a><br>
</div></div></blockquote></div><br></div>