[LON-CAPA-admin] unable to contact DNS
Stuart Raeburn
raeburn at msu.edu
Mon Nov 9 13:29:46 EST 2015
Bob,
One of the changes between CentOS 5/6 and 7 is the use of systemd (and
systemctl commands) instead of SysV to control start-up of services on
boot.
That said, the previously used SysV services -- including loncontrol
which starts the LON-CAPA daemons and updated iptables rules --
continue to work in CentOS 7.
/sbin/chkconfig --list
will show those.
If you look in /var/log/messages after boot you should see the lines:
loncontrol: Opening firewall access on port 5663
loncontrol: Starting LON-CAPA
indicating that LON-CAPA was started.
The message logged in lonnet.log ("unable to contact DNS defaulting to
on disk file") originates in lonnet::get_dns(), which is called when
information is needed about cluster membership (and the cache has not
been populated).
If you run:
/etc/init.d/loncontrol restart
after your name service is available then that will cause the daemons
to be restarted (and Apache reloaded), which will include retrieval of
cluster membership information (and caching). Therefore it is
equivalent to the operation of /etc/init.d/loncontrol start (on boot)
with name service already available.
On LON-CAPA instances I manage on CentOS 7 I do not see this issue on
boot, when lonnet::get_dns() is called. You might look in
/var/log/messages to see when start up of your name service occurs
following boot.
In CentOS 7 the idea behind using systemd is to make the start up
process faster by starting services in parallel, as much as possible,
and you can control this process by modifying files in
/usr/lib/systemd/system.
If you want to convert management of loncontrol from SysV to systemd
you should add a file:
loncontrol.service
to /usr/lib/systemd/system with the following contents:
[Unit]
Description=Manage LON-CAPA daemons and update iptables rules for port 5663
Wants=network-online.target nss-lookup.target
After=network-online.target nss-lookup.target syslog.target basic.target
[Service]
RemainAfterExit=yes
ExecStart= /etc/init.d/loncontrol start
ExecReload=/etc/init.d/loncontrol reload
ExecStop=/etc/init.d/loncontrol stop
StandardOutput=syslog
StandardError=syslog
[Install]
WantedBy=multi-user.target
then use the command systemctl enable loncontrol.service
(Note: if you make changes to files in /usr/lib/systemd/system you
should then do: systemctl daemon-reload).
Once loncontrol is converted for SysV to systemd ...
To start loncontrol use:
systemctl start loncontrol.service
to stop loncontrol use:
systemctl stop loncontrol.service
to restart loncontrol use:
systemctl restart loncontrol.service
and to reload lonc and lond use:
systemctl reload loncontrol.service
Output from these commands will be available using:
systemctl status loncontrol.service
or by looking in /var/log/messages.
> ... But, since I'm still having some
> serious performance issues when high numbers of students log in for a quiz,
> I was curious if this is something worth looking into more deeply.
The relative timing of starting of loncontrol and your named service
on reboot is not related to your perfomance issues, as long as you ran
/etc/init.d/loncontrol restart after the named service had become
available.
If you are encountering performance issues on your servers at times of
peak usage you might want to modify the lonLoadLim or lonUserLoadLim
values on your servers.
If students log-in via a load balancer, and you want sessions to be
offloaded to LON-CAPA servers elsewhere in the network, at times when
all the binghamton servers are overloaded you should ensure that:
(a) Domain settings for "Dedicated Load Balancer(s)" include the
binghamton access servers in the "Default destinations" in the
"Offloads to: primary" category.
and
(b) Domain settings for "User session hosting/offloading" for the
loadbalancer machine/VM include the MSU access servers -- msua1,
msua2, msua3, msua4 in the "default" category.
It would be useful to have information about server loads etc. at
these times when large numbers of students log-in (and you see these
performance issues). I can provide scripts to gather load data (and
display in MRTG). Contact me off-list if you are interested.
Stuart Raeburn
LON-CAPA Academic Consortium
Quoting Bob Gonzales <rgonzal at binghamton.edu>:
> Hi,
>
> After upgrading Centos to 7.1 and Lon-capa to 2.11 this summer I often get
> a message like this after a reboot:
>
> Sun Nov 8 09:19:49 2015 (1709): unable to contact DNS defaulting to on
> disk file dns_domain.tab
>
> and then I get a lot of messages like this:
>
> Sun Nov 8 09:19:49 2015 (1052): Name s4.lite.msu.edu no IP found
>
> If I restart Lon-capa via '/etc/init.d/loncontrol restart', the messages
> don't appear in lonnet.log but I don't know if the same initialization
> happens so that might, or might not, be OK. The lonnet.log doesn't show
> these 'no IP found' messages when loncron does it nightly run the next day
> but, again, I don't know if they same initialization happens then either.
>
>
> I've assumed that it is the result of the name service in Centos not having
> finished it's startup before Lon-capa started and that it resolves itself
> because I can log in right after the reboot and ping various non-Lon-capa,
> and local lon-capa machines by name. But, since I'm still having some
> serious performance issues when high numbers of students log in for a quiz,
> I was curious if this is something worth looking into more deeply.
>
>
> Thanks,
> Bob Gonzales
> Binghamton University
> Chemistry Dept
> rgonzal at binghamton.edu
>
More information about the LON-CAPA-admin
mailing list