[LON-CAPA-admin] {Disarmed} Re: Published resources not showing up on access servers

Bob Gonzales rgonzal at binghamton.edu
Sun Sep 20 09:41:23 EDT 2015


Just to finish this up.

My 'Map not loaded:' problem disappeared this morning.  I'm assuming that
the nightly run of loncontrol on all the servers took care of that.

A reboot and then an additional restart via loncontrol on my library server
took care of published items not replicating.   The 'binghamtonaX (take
your pick of 2,3,5,6,7 for ):not found' message from the subscription files
seems to point to a dns name resolution problem.  This might be due to some
network function in Centos 7 not finishing it's startup before loncontrol
runs.  I have a similar issue on another server where the Darwin streaming
server won't run after a reboot until your restart the Darwin daemon.

Thanks to Stuart for his time and continued documenting of how things work
in loncapa, it's extremely helpful.

Bob Gonzales
Binghamton University
Chemistry Dept


On Sat, Sep 19, 2015 at 4:24 PM, Bob Gonzales <rgonzal at binghamton.edu>
wrote:

> Published resources seem to be replicating now.
>
> I'm guessing that is it because of the reboot and then the restart of
> loncontrol.  I will go through the /var/log/messages file to see if can see
> anyway that loncontrol ran before dns services were available.
>
> Now, I just have to figure out the following message when I try to log
> into my test course:
>
> Map not loaded: The file /home/httpd/html/userfiles/binghamton/
> 71215825fc2344762binghamtonl1/default.sequence does not exist.
>
> Thanks for all the help so far, Stuart!
>
> Bob Gonzales
> Binghamton University
> Chemistry Dept
>
> On Sat, Sep 19, 2015 at 12:15 PM, Stuart Raeburn <raeburn at msu.edu> wrote:
>
>> Bob,
>>
>> The subscription file shows the access server is subscribed.
>>>   binghamtona2:128.226.130.9:1442502030
>>>
>>
>> The timestamp 1442502030 converts to: Thu Sep 17 11:00:30 EDT
>> ( http://www.epochconverter.com/ )
>>
>> which is after the (Thu Sep 17 10:49:37 EDT) time of the:
>> "access failed for 128.226.130.9, reason: binghamtona2 not subscribed"
>> message.
>>
>> You might check that your servers are all running ntpd, and have the
>> correct date/time.
>>
>> It is possible that if binghamtona2 had already been subscribed at
>> 10:49:37, you could still see an error message: "binghamtona2 not
>> subscribed", if there whad been a problem with the DNS look-up on your
>> library server for the hostname: loncapa2.chem.binghamton.edu.
>>
>> lonracc.pm which authorizes access to files in /raw/ includes:
>>
>> my (undef,undef,undef,undef,$ip) = gethostbyname($hostname);
>> return 0 if (length($ip) != 4);
>> $ip=inet_ntoa($ip);
>>
>> where $hostname would be loncapa2.chem.binghamton.edu and $ip would be
>> 128.226.130.9
>>
>> It is possible then that there is a issue with the DNS service that is
>> used to resolve the hostname: loncapa2.chem.binghamton.edu
>>
>> I see binghamton.edu in the dns_hosts.tab and and hosts.tab on the
>>> library
>>> server but I see chem.binghamton.edu in the access servers hosts.tab.
>>> My
>>> dns domain is chem.binghamton.edu and I do remember using that during
>>> the
>>> installation on each machine this summer.  Could this be it?
>>>
>>>
>> The last entry in your server's record in hosts.tab is what is referred
>> to in LON-CAPA as the "internet domain".  I do not believe that is the
>> source of the problem here.  That value is used as a way to combine an
>> institution's servers together into a group, if you have set the domain
>> configuration to restrict user session hosting for users from domains
>> outside your institution.  Although you should set it to be the same on all
>> your servers, it does not currently play a role in controlling content
>> replication.
>>
>> So it should be set to binghamton.edu on all your servers. for
>> consistency with what is present in the "authoritative" cluster tables,
>> e.g., https://s10.lite.msu.eu/adm/dns/hosts
>>
>> It would only make sense to set it to chem.binghamton.edu if another
>> department at Binghamton installed LON-CAPA on their own servers, and you
>> wanted to prohibit hosting of your user sessions on their servers, or vice
>> versa.
>>
>>
>>
>> Stuart Raeburn
>> LON-CAPA Academic Consortium
>>
>>
>> Quoting Bob Gonzales <rgonzal at binghamton.edu>:
>>
>> Stuart,
>>>
>>> In my library servier httpd error_log I see:
>>>
>>> [Thu Sep 17 10:49:37.441421 2015] [:error] [pid 2605] access to
>>> /home/httpd/html/raw/binghamton/gonzales/Testing Area/foobar.problem
>>> failed
>>> for 128.226.130.9, reason: binghamtona2 not subscribed
>>>
>>> The subscription file shows the access server is subscribed.
>>>
>>>   binghamtona2:128.226.130.9:1442502030
>>>
>>>
>>> Using the info in the referenced post
>>>
>>> http://mail.lon-capa.org/pipermail/lon-capa-admin/2013-November/002773.html
>>> ,
>>> I'm thinking a name/ip problem.
>>>
>>> So, regarding entries in /etc/hosts, there are only the default ones for
>>> 127.0.0.1 and ::1 on all the machines.
>>>
>>> Looking at dns_hosts.tab  I see this on all the servers (access and
>>> library):
>>>
>>> binghamtonl1:binghamton:library:loncapal1.chem.binghamton.edu:http:
>>> binghamton.edu
>>> binghamtona2:binghamton:access:loncapa2.chem.binghamton.edu:http:
>>> binghamton.edu
>>> binghamtona3:binghamton:access:loncapa3.chem.binghamton.edu:http:
>>> binghamton.edu
>>> binghamtona4:binghamton:access:loncapa.chem.binghamton.edu:http:
>>> binghamton.edu
>>> binghamtona5:binghamton:access:loncapa5.chem.binghamton.edu:http:
>>> binghamton.edu
>>> binghamtona6:binghamton:access:loncapa6.chem.binghamton.edu:http:
>>> binghamton.edu
>>> binghamtona7:binghamton:access:loncapa7.chem.binghamton.edu:http:
>>> binghamton.edu
>>>
>>> Looking at hosts I see this on the library server:
>>>
>>> ^s10.lite.msu.edu
>>> ^dalton.chem.sfu.ca
>>> ^library1.lon-capa.uiuc.edu
>>> binghamtonl1:binghamton:library:loncapal1.chem.binghamton.edu:http:
>>> binghamton.edu
>>>
>>> and this on one of the access servers:
>>> ^s10.lite.msu.edu
>>> ^dalton.chem.sfu.ca
>>> ^library1.lon-capa.uiuc.edu
>>> binghamtona2:binghamton:access:loncapa2.chem.binghamton.edu:http:
>>> chem.binghamton.edu
>>>
>>> I see binghamton.edu in the dns_hosts.tab and and hosts.tab on the
>>> library
>>> server but I see chem.binghamton.edu in the access servers hosts.tab.
>>> My
>>> dns domain is chem.binghamton.edu and I do remember using that during
>>> the
>>> installation on each machine this summer.  Could this be it?
>>>
>>> Thanks,
>>> Bob Gonzales
>>>
>>>
>>> On Thu, Sep 17, 2015 at 5:23 PM, Stuart Raeburn <raeburn at msu.edu> wrote:
>>>
>>> Hi Bob,
>>>>
>>>> Looking at lonc.log on the MSU LON-CAPA library server s10 I see that
>>>> the
>>>> run of /home/httpd/perl/loncron used to update connections and retrieve
>>>> updated cluster membership information failed to connect to
>>>> binghamtona2 at
>>>> 5:11 am today.
>>>>
>>>> On the MSU library server ...
>>>>
>>>> Thu Sep 17 05:11:14 2015 (2148) [loncapa2.chem.binghamton.edu] [Thu Sep
>>>> 17 05:11:14 2015: Parent keeping the flock] <font color='red'>CRITICAL:
>>>> Forking server for loncapa2.chem.binghamton.edu</font>
>>>> Thu Sep 17 05:11:18 2015 (13722) [loncapa2.chem.binghamton.edu] [Thu
>>>> Sep
>>>> 17 05:11:14 2015: Connected to loncapa2.chem.binghamton.edu] <font
>>>> color='red'>CRITICAL: Failed to make a connection with lond.</font>
>>>> Thu Sep 17 05:11:18 2015 (13722) [loncapa2.chem.binghamton.edu] [Thu
>>>> Sep
>>>> 17 05:11:14 2015: Connected to loncapa2.chem.binghamton.edu] <font
>>>> color='blue'>WARNING: Failing transaction sethost</font>
>>>>
>>>> As a result the lonc status reported on s10 for
>>>> loncapa2.chem.binghamton.edu was:
>>>>
>>>> www      13722  2148  0 05:11 ?        00:00:00 lonc:
>>>> loncapa2.chem.binghamton.edu Connection count: 0 Retries remaining: 5
>>>> ()
>>>> Thu Sep 17 13:29:08 2015
>>>>
>>>> However, when I attempted to connect to binghamtona2 a few hours ago the
>>>> connection was successfully created:
>>>>
>>>> www      13722  2148  0 05:11 ?        00:00:00 lonc:
>>>> loncapa2.chem.binghamton.edu Connection count: 1 Retries remaining: 5
>>>> (insecure) Thu Sep 17 13:34:39 2015
>>>>
>>>> and I was able to transfer my user session there.
>>>>
>>>> From the lonc.log information on your library server ...
>>>>
>>>>    Thu Sep 17 11:05:19 2015 (3279) [loncapa2.chem.binghamton.edu] [Thu
>>>> Sep
>>>>
>>>>> 17 10:53:24 2015: loncapa2.chem.binghamton.edu Connection count: 0
>>>>> Retries
>>>>> remaining: 5 ()] <font color='green'>SUCCESS: Connection 1 to
>>>>> loncapa2.chem.binghamton.edu now ready for action</font>
>>>>>    Thu Sep 17 11:10:20 2015 (3279) [loncapa2.chem.binghamton.edu] [Thu
>>>>> Sep
>>>>> 17 11:10:20 2015: loncapa2.chem.binghamton.edu Connection count: 1
>>>>> Retries
>>>>> remaining: 5 (insecure)] <font color='blue'>WARNING: Shutting down a
>>>>> socket</font>
>>>>>
>>>>>
>>>> This indicates successful creation of a connection, and then shutdown 5
>>>> minutes later, as a result of pruning of idle connections (this is all
>>>> as
>>>> expected).  The IdleTimeout hard-coded in /home/httpd/perl/loncnew is 5
>>>> minutes.
>>>>
>>>>    Thu Sep 17 10:53:24 2015 (3279) [loncapa2.chem.binghamton.edu] [Thu
>>>> Sep
>>>>
>>>>> 17 10:42:10 2015: loncapa2.chem.binghamton.edu Connection count: 0
>>>>> Retries
>>>>> remaining: 5 ()] <font color='yellow'>INFO: Resetting Connection
>>>>> Retries.</font>
>>>>>
>>>>>
>>>> This occurs when &Apache::lonnet::reconlonc() sends "reset_retries".
>>>> &reconlonc($clientname) is called in lond when lond is connected by a
>>>> known
>>>> client (excluding connections from itself).
>>>>
>>>>       Cleanup phase: Notifications
>>>>
>>>>>
>>>>>      Notifying host binghamtona7:not_found
>>>>>      Notifying host binghamtona2:not_found
>>>>>
>>>>>
>>>> The logged "not_found" messages originate from
>>>> lond::update_resource_handler() on the library server, and will be sent
>>>> if
>>>> the old copy of the file which has just been republished is no longer
>>>> present on the access server.
>>>>
>>>> That can happen if the Linux OS on the access server was updated without
>>>> preserving the contents of /home/httpd/html/res/.  This is not an issue,
>>>> including the case where the access server had a subscription to the
>>>> file
>>>> before the access server was updated/rebuilt.  When the file is
>>>> requested
>>>> in a user session hosted on the access server it should be replicated to
>>>> the access server from the library server, as usual.
>>>>
>>>> However this:
>>>>
>>>>    Thu Sep 17 11:00:30 2015 (3908): <font color="blue">WARNING: LWP get:
>>>>
>>>>> 403 Forbidden: /home/httpd/html/res/binghamton/gonzales/Testing
>>>>> Area/foobar.problem</font>
>>>>>
>>>>>
>>>>> indicates that the replication failed.  You might check in the
>>>> corresponding Apache error_log file for the request for:
>>>>
>>>> GET /raw/binghamton/gonzales/TestingArea/foobar.problem
>>>>
>>>> from 128.226.130.9.
>>>>
>>>> See:
>>>>
>>>>
>>>> http://mail.lon-capa.org/pipermail/lon-capa-admin/2013-November/002773.html
>>>>
>>>> for a discussion of things to check.  In that particular instance the
>>>> "Forbidden" error was caused by an incorrect IP address in /etc/hosts
>>>>
>>>> Stuart Raeburn
>>>> LON-CAPA Academic Consortium
>>>>
>>>>
>>>> Quoting Bob Gonzales <rgonzal at binghamton.edu>:
>>>>
>>>> Hi,
>>>>
>>>>>
>>>>> I'm running 2.11.1 and centos 7.1 on all my servers.
>>>>>
>>>>> When I publish a resource and then try to access it on any of my access
>>>>> servers I get the message 'unable to find' and then the name of the
>>>>> problem
>>>>> just published.
>>>>>
>>>>> The subscription file for the problem shows the access server has
>>>>> subscribed.
>>>>>
>>>>> When I republished a file the log file for the problem I got the
>>>>> following
>>>>> information from the log file for the problem:
>>>>>
>>>>>       Cleanup phase: Notifications
>>>>>
>>>>>      Notifying host binghamtona7:not_found
>>>>>      Notifying host binghamtona2:not_found
>>>>>      Notifying course binghamton_71215825fc2344762binghamtonl1:ok
>>>>>
>>>>> I see a lot of these messages for binghamtona2 (loncapa2) in the
>>>>> library
>>>>> server lonnet.log
>>>>>
>>>>> Thu Sep 17 10:53:24 2015 (8257): Trying to reconnect lonc for
>>>>> binghamtona2 (
>>>>> loncapa2.chem.binghamton.edu)
>>>>>
>>>>> And these in the library servers lonc.log
>>>>>
>>>>>     Thu Sep 17 10:42:10 2015 (3279) [loncapa2.chem.binghamton.edu]
>>>>> [Thu
>>>>> Sep
>>>>> 17 10:42:10 2015: loncapa2.chem.binghamton.edu Connection count: 0
>>>>> Retries
>>>>> remaining: 5 ()] <font color='yellow'>INFO: Resetting Connection
>>>>> Retries.</font>
>>>>>    Thu Sep 17 10:53:24 2015 (3279) [loncapa2.chem.binghamton.edu] [Thu
>>>>> Sep
>>>>> 17 10:42:10 2015: loncapa2.chem.binghamton.edu Connection count: 0
>>>>> Retries
>>>>> remaining: 5 ()] <font color='yellow'>INFO: Resetting Connection
>>>>> Retries.</font>
>>>>>    Thu Sep 17 10:53:24 2015 (3279) [loncapa2.chem.binghamton.edu] [Thu
>>>>> Sep
>>>>> 17 10:53:24 2015: loncapa2.chem.binghamton.edu Connection count: 0
>>>>> Retries
>>>>> remaining: 5 ()] <font color='yellow'>INFO: Resetting Connection
>>>>> Retries.</font>
>>>>>    Thu Sep 17 10:53:24 2015 (3279) [loncapa2.chem.binghamton.edu] [Thu
>>>>> Sep
>>>>> 17 10:53:24 2015: loncapa2.chem.binghamton.edu  Connection count: 0
>>>>> Retries
>>>>> remaining: 5 ()] <font color='yellow'>INFO: Resetting Connection
>>>>> Retries.</font>
>>>>>    Thu Sep 17 11:05:19 2015 (3279) [loncapa2.chem.binghamton.edu] [Thu
>>>>> Sep
>>>>> 17 10:53:24 2015: loncapa2.chem.binghamton.edu Connection count: 0
>>>>> Retries
>>>>> remaining: 5 ()] <font color='green'>SUCCESS: Created connection 1 to
>>>>> host
>>>>> loncapa2.chem.binghamton.edu</font>
>>>>>    Thu Sep 17 11:05:19 2015 (3279) [loncapa2.chem.binghamton.edu] [Thu
>>>>> Sep
>>>>> 17 10:53:24 2015: loncapa2.chem.binghamton.edu Connection count: 0
>>>>> Retries
>>>>> remaining: 5 ()] <font color='yellow'>INFO: Connected to lond version:
>>>>> 489</font>
>>>>>    Thu Sep 17 11:05:19 2015 (3279) [loncapa2.chem.binghamton.edu] [Thu
>>>>> Sep
>>>>> 17 10:53:24 2015: loncapa2.chem.binghamton.edu Connection count: 0
>>>>> Retries
>>>>> remaining: 5 ()] <font color='green'>SUCCESS: Connection 1 to
>>>>> loncapa2.chem.binghamton.edu now ready for action</font>
>>>>>    Thu Sep 17 11:10:20 2015 (3279) [loncapa2.chem.binghamton.edu] [Thu
>>>>> Sep
>>>>> 17 11:10:20 2015: loncapa2.chem.binghamton.edu Connection count: 1
>>>>> Retries
>>>>> remaining: 5 (insecure)] <font color='blue'>WARNING: Shutting down a
>>>>> socket</font>
>>>>>
>>>>> I also see this in the access servers lonnet.log:
>>>>>
>>>>>    Thu Sep 17 11:00:30 2015 (3908): <font color="blue">WARNING: LWP
>>>>> get:
>>>>> 403 Forbidden: /home/httpd/html/res/binghamton/gonzales/Testing
>>>>> Area/foobar.problem</font>
>>>>>
>>>>> I don't really know what all of this means.  Any help would be
>>>>> appreciated.
>>>>>
>>>>> Thanks,
>>>>> Bob Gonzales
>>>>> Binghamton University
>>>>> Chemistry Dept
>>>>>
>>>>>
>> _______________________________________________
>> LON-CAPA-admin mailing list
>> LON-CAPA-admin at mail.lon-capa.org
>> http://mail.lon-capa.org/mailman/listinfo/lon-capa-admin
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.lon-capa.org/pipermail/lon-capa-admin/attachments/20150920/8f31f7a1/attachment-0001.html>


More information about the LON-CAPA-admin mailing list