[LON-CAPA-admin] unable to contact DNS

Wed Nov 11 15:04:09 EST 2015

After some more investigation, it appears that both the network service and
the NetworkManager service are trying to be involved, simultaneously,
during startup.  Stopping and disabling the NetworkManager service seems to
have fixed those failing network startup(s) and there isn't an 'unable to
contact DNS' message in lonnet.log on the 2 servers that I tried this on.

Bob Gonzales
Binghamton University
Chemistry Dept

On Wed, Nov 11, 2015 at 9:08 AM, Bob Gonzales <rgonzal at binghamton.edu>
wrote:

> Stuart,
>
> A quick look at /var/log/messages on the servers with the 'unable to
> contact DNS' message shows that the network is failing to start before
> loncontrol runs.  It does try to start up again later in the boot process
> and is successful.  This seems doesn't seem to always happen this way.
> Sometimes the network does start successfully the first time.
>
> So, I think attempt to see why I have the inconsistency in start up and if
> I can't determine what the cause is conclusively then I will do the
> conversion to systemd.
>
> Also, I will contact you about my performance issues.
>
> Thank you for your time and effort.
>
> Bob Gonzales
> Binghamton University
> Chemistry
>
> On Mon, Nov 9, 2015 at 1:29 PM, Stuart Raeburn <raeburn at msu.edu> wrote:
>
>> Bob,
>>
>> One of the changes between CentOS 5/6 and 7 is the use of systemd (and
>> systemctl commands) instead of SysV to control start-up of services on boot.
>>
>> That said, the previously used SysV services -- including loncontrol
>> which starts the LON-CAPA daemons and updated iptables rules -- continue to
>> work in CentOS 7.
>>
>> /sbin/chkconfig --list
>>
>> will show those.
>>
>> If you look in /var/log/messages after boot you should see the lines:
>>
>> loncontrol: Opening firewall access on port 5663
>> loncontrol: Starting LON-CAPA
>>
>> indicating that LON-CAPA was started.
>>
>> The message logged in lonnet.log ("unable to contact DNS defaulting to on
>> disk file") originates in lonnet::get_dns(), which is called when
>> information is needed about cluster membership (and the cache has not been
>> populated).
>>
>> If you run:
>>
>> /etc/init.d/loncontrol restart
>>
>> after your name service is available then that will cause the daemons to
>> be restarted (and Apache reloaded), which will include retrieval of cluster
>> membership information (and caching).  Therefore it is equivalent to the
>> operation of /etc/init.d/loncontrol start (on boot) with name service
>> already available.
>>
>> On LON-CAPA instances I manage on CentOS 7 I do not see this issue on
>> boot, when lonnet::get_dns() is called.  You might look in
>> /var/log/messages to see when start up of your name service occurs
>> following boot.
>>
>> In CentOS 7 the idea behind using systemd is to make the start up process
>> faster by starting services in parallel, as much as possible, and you can
>> control this process by modifying files in /usr/lib/systemd/system.
>>
>> If you want to convert management of loncontrol from SysV to systemd you
>> should add a file:
>>
>> loncontrol.service
>>
>> to /usr/lib/systemd/system with the following contents:
>>
>> [Unit]
>> Description=Manage LON-CAPA daemons and update iptables rules for port
>> 5663
>> Wants=network-online.target nss-lookup.target
>> After=network-online.target nss-lookup.target syslog.target basic.target
>>
>> [Service]
>> RemainAfterExit=yes
>> ExecStart= /etc/init.d/loncontrol start
>> ExecReload=/etc/init.d/loncontrol reload
>> ExecStop=/etc/init.d/loncontrol stop
>> StandardOutput=syslog
>> StandardError=syslog
>>
>> [Install]
>> WantedBy=multi-user.target
>>
>> then use the command systemctl enable loncontrol.service
>>
>> (Note: if you make changes to files in /usr/lib/systemd/system you should
>> then do: systemctl daemon-reload).
>>
>> Once loncontrol is converted for SysV to systemd ...
>>
>> To start loncontrol use:
>> systemctl start loncontrol.service
>>
>> to stop loncontrol use:
>> systemctl stop loncontrol.service
>>
>> to restart loncontrol use:
>> systemctl restart loncontrol.service
>>
>> and to reload lonc and lond use:
>> systemctl reload loncontrol.service
>>
>> Output from these commands will be available using:
>>
>> systemctl status loncontrol.service
>>
>> or by looking in /var/log/messages.
>>
>> ... But, since I'm still having some
>>> serious performance issues when high numbers of students log in for a
>>> quiz,
>>> I was curious if this is something worth looking into more deeply.
>>>
>>
>> The relative timing of starting of loncontrol and your named service on
>> reboot is not related to your perfomance issues, as long as you ran
>> /etc/init.d/loncontrol restart after the named service had become available.
>>
>> If you are encountering performance issues on your servers at times of
>> peak usage you might want to modify the lonLoadLim or lonUserLoadLim values
>> on your servers.
>>
>> If students log-in via a load balancer, and you want sessions to be
>> offloaded to LON-CAPA servers elsewhere in the network, at times when all
>> the binghamton  servers are overloaded you should ensure that:
>>
>> (a) Domain settings for "Dedicated Load Balancer(s)" include the
>> binghamton access servers in the "Default destinations" in the "Offloads
>> to: primary" category.
>>
>> and
>>
>> (b) Domain settings for "User session hosting/offloading" for the
>> loadbalancer machine/VM include the MSU access servers -- msua1, msua2,
>> msua3, msua4 in the "default" category.
>>
>> It would be useful to have information about server loads etc. at these
>> times when large numbers of students log-in (and you see these performance
>> issues).  I can provide scripts to gather load data (and display in MRTG).
>> Contact me off-list if you are interested.
>>
>>
>> Stuart Raeburn
>> LON-CAPA Academic Consortium
>>
>>
>>
>> Quoting Bob Gonzales <rgonzal at binghamton.edu>:
>>
>> Hi,
>>>
>>> After upgrading Centos to 7.1 and Lon-capa to 2.11 this summer I often
>>> get
>>> a message like this after a reboot:
>>>
>>> Sun Nov  8 09:19:49 2015 (1709): unable to contact DNS defaulting to on
>>> disk file dns_domain.tab
>>>
>>> and then I get a lot of messages like this:
>>>
>>> Sun Nov  8 09:19:49 2015 (1052): Name s4.lite.msu.edu no IP found
>>>
>>> If I restart Lon-capa via  '/etc/init.d/loncontrol restart', the messages
>>> don't appear in lonnet.log but I don't know if the same initialization
>>> happens so that might, or might not, be OK.  The lonnet.log doesn't show
>>> these 'no IP found' messages when loncron does it nightly run the next
>>> day
>>> but, again, I don't know if they same initialization happens then either.
>>>
>>>
>>> I've assumed that it is the result of the name service in Centos not
>>> having
>>> finished it's startup before Lon-capa started and that it resolves itself
>>> because I can log in right after the reboot and ping various
>>> non-Lon-capa,
>>> and local lon-capa machines by name.  But, since I'm still having some
>>> serious performance issues when high numbers of students log in for a
>>> quiz,
>>> I was curious if this is something worth looking into more deeply.
>>>
>>>
>>> Thanks,
>>> Bob Gonzales
>>> Binghamton University
>>> Chemistry Dept
>>> rgonzal at binghamton.edu
>>>
>>>
>>
>>
>> _______________________________________________
>> LON-CAPA-admin mailing list
>> LON-CAPA-admin at mail.lon-capa.org
>> http://mail.lon-capa.org/mailman/listinfo/lon-capa-admin
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.lon-capa.org/pipermail/lon-capa-admin/attachments/20151111/fbf4d742/attachment.html>