[LON-CAPA-admin] unable to contact DNS

Stuart Raeburn raeburn at msu.edu
Mon Nov 9 13:29:46 EST 2015


Bob,

One of the changes between CentOS 5/6 and 7 is the use of systemd (and  
systemctl commands) instead of SysV to control start-up of services on  
boot.

That said, the previously used SysV services -- including loncontrol  
which starts the LON-CAPA daemons and updated iptables rules --  
continue to work in CentOS 7.

/sbin/chkconfig --list

will show those.

If you look in /var/log/messages after boot you should see the lines:

loncontrol: Opening firewall access on port 5663
loncontrol: Starting LON-CAPA

indicating that LON-CAPA was started.

The message logged in lonnet.log ("unable to contact DNS defaulting to  
on disk file") originates in lonnet::get_dns(), which is called when  
information is needed about cluster membership (and the cache has not  
been populated).

If you run:

/etc/init.d/loncontrol restart

after your name service is available then that will cause the daemons  
to be restarted (and Apache reloaded), which will include retrieval of  
cluster membership information (and caching).  Therefore it is  
equivalent to the operation of /etc/init.d/loncontrol start (on boot)  
with name service already available.

On LON-CAPA instances I manage on CentOS 7 I do not see this issue on  
boot, when lonnet::get_dns() is called.  You might look in  
/var/log/messages to see when start up of your name service occurs  
following boot.

In CentOS 7 the idea behind using systemd is to make the start up  
process faster by starting services in parallel, as much as possible,  
and you can control this process by modifying files in  
/usr/lib/systemd/system.

If you want to convert management of loncontrol from SysV to systemd  
you should add a file:

loncontrol.service

to /usr/lib/systemd/system with the following contents:

[Unit]
Description=Manage LON-CAPA daemons and update iptables rules for port 5663
Wants=network-online.target nss-lookup.target
After=network-online.target nss-lookup.target syslog.target basic.target

[Service]
RemainAfterExit=yes
ExecStart= /etc/init.d/loncontrol start
ExecReload=/etc/init.d/loncontrol reload
ExecStop=/etc/init.d/loncontrol stop
StandardOutput=syslog
StandardError=syslog

[Install]
WantedBy=multi-user.target

then use the command systemctl enable loncontrol.service

(Note: if you make changes to files in /usr/lib/systemd/system you  
should then do: systemctl daemon-reload).

Once loncontrol is converted for SysV to systemd ...

To start loncontrol use:
systemctl start loncontrol.service

to stop loncontrol use:
systemctl stop loncontrol.service

to restart loncontrol use:
systemctl restart loncontrol.service

and to reload lonc and lond use:
systemctl reload loncontrol.service

Output from these commands will be available using:

systemctl status loncontrol.service

or by looking in /var/log/messages.

> ... But, since I'm still having some
> serious performance issues when high numbers of students log in for a quiz,
> I was curious if this is something worth looking into more deeply.

The relative timing of starting of loncontrol and your named service  
on reboot is not related to your perfomance issues, as long as you ran  
/etc/init.d/loncontrol restart after the named service had become  
available.

If you are encountering performance issues on your servers at times of  
peak usage you might want to modify the lonLoadLim or lonUserLoadLim  
values on your servers.

If students log-in via a load balancer, and you want sessions to be  
offloaded to LON-CAPA servers elsewhere in the network, at times when  
all the binghamton  servers are overloaded you should ensure that:

(a) Domain settings for "Dedicated Load Balancer(s)" include the  
binghamton access servers in the "Default destinations" in the  
"Offloads to: primary" category.

and

(b) Domain settings for "User session hosting/offloading" for the  
loadbalancer machine/VM include the MSU access servers -- msua1,  
msua2, msua3, msua4 in the "default" category.

It would be useful to have information about server loads etc. at  
these times when large numbers of students log-in (and you see these  
performance issues).  I can provide scripts to gather load data (and  
display in MRTG).  Contact me off-list if you are interested.


Stuart Raeburn
LON-CAPA Academic Consortium


Quoting Bob Gonzales <rgonzal at binghamton.edu>:

> Hi,
>
> After upgrading Centos to 7.1 and Lon-capa to 2.11 this summer I often get
> a message like this after a reboot:
>
> Sun Nov  8 09:19:49 2015 (1709): unable to contact DNS defaulting to on
> disk file dns_domain.tab
>
> and then I get a lot of messages like this:
>
> Sun Nov  8 09:19:49 2015 (1052): Name s4.lite.msu.edu no IP found
>
> If I restart Lon-capa via  '/etc/init.d/loncontrol restart', the messages
> don't appear in lonnet.log but I don't know if the same initialization
> happens so that might, or might not, be OK.  The lonnet.log doesn't show
> these 'no IP found' messages when loncron does it nightly run the next day
> but, again, I don't know if they same initialization happens then either.
>
>
> I've assumed that it is the result of the name service in Centos not having
> finished it's startup before Lon-capa started and that it resolves itself
> because I can log in right after the reboot and ping various non-Lon-capa,
> and local lon-capa machines by name.  But, since I'm still having some
> serious performance issues when high numbers of students log in for a quiz,
> I was curious if this is something worth looking into more deeply.
>
>
> Thanks,
> Bob Gonzales
> Binghamton University
> Chemistry Dept
> rgonzal at binghamton.edu
>





More information about the LON-CAPA-admin mailing list