[LON-CAPA-admin] Access Server Problems

Todd Ruskell truskell at mines.edu
Fri Oct 31 10:43:35 EDT 2008


Stuart,

Yes, I did figure it out yesterday, but didn't have a chance to post back.

Boy, am I embarrassed, though. . . it turned out that the folks which
built the box had put an entry for our library server in /etc/hosts, but
had accidentally exchanged an 8 for a 6 in the ip address, which
depending on fonts is really easy to miss when trying to trouble-shoot,
especially when the two access servers *do* have a 6 for that digit. :(

Thanks for your help.  I now know a bit more about trying to
troubleshoot loncapa connections, should I have to do it again in the
future.

Todd

Stuart Raeburn wrote:
> Todd,
> 
> I can now browse /res/csm when logged into http://lc1.mines.edu so you
> were evidently able to resolve the connection issue between your access
> server and the csm library server.
> 
> Stuart Raeburn
> MSU LON-CAPA group
> 
> Quoting Stuart Raeburn <raeburn at msu.edu>:
> 
>> Todd,
>>
>> My guess is that within your internal network, lc1.mines.edu appears to
>> have an IP address of 138.67.1.77 (host 138.67.1.77 reports:
>> mica.Mines.EDU) when attempting to set up a connection, whereas outside
>> the network it is mapping to 138.67.38.59 (which is what host
>> lc1.mines.edu says it should be).
>>
>> Perhaps this is a consequence of the configuration for your
>> virtualization?
>> I say this because http://lc2.mines.edu/lon-status/ includes the
>> following from lond.log
>>
>> Wed Oct 29 05:00:50 2008 (6782): WARNING: Rejected client 138.67.1.77,
>> closing connection Wed Oct 29 05:00:50 2008 (6782): CRITICAL:
>> Disconnect from 138.67.1.77 () Wed Oct 29 05:00:50 2008 (9055): Child
>> 6782 died Wed Oct 29 05:02:50 2008 (9055):  Attempting to start child
>> (IO::Socket::INET=GLOB(0x8d3248c)) Wed Oct 29 05:02:50 2008 (6785):
>> WARNING: Unknown client 138.67.1.77 Wed Oct 29 05:02:50 2008 (6785):
>> WARNING: Rejected client 138.67.1.77, closing connection Wed Oct 29
>> 05:02:50 2008 (6785): CRITICAL: Disconnect from 138.67.1.77 () Wed Oct
>> 29 05:02:50 2008 (9055): Child 6785 died Wed Oct 29 05:04:50 2008
>> (9055):  Attempting to start child \CRITICAL: Disconnect from
>> 138.67.1.77 () Wed Oct 29 05:08:50 2008 (9055): Unable to determine who
>> caller was, getpeername returned nothing Wed Oct 29 05:08:50 2008
>> (9055): Unable to determine clientip Wed Oct 29 05:08:50 2008 (9055):
>> Child 8939 died
>>
>> http://loncapa.mines.edu/lon-status/ has the same in lond.log from
>> September 18th (the most recent file).
>>
>> Is loncron not being run nightly by cron at 5.10 am on your library
>> server: loncapa.mines.edu? The date at the top of this lon-status page
>> reads:
>>
>> LON Status Report csml1
>> Thu Sep 18 05:10:58 2008
>>
>> By contrast http://lc1.mines.edu/lon-status/ reports connections to
>> servers in other domains (binghamton, uiuc) for October 29.
>>
>> I verified that I could login to csma1 with a username from the msu
>> domain, and view resources in the msu domain, so it seems that
>> connections to/from other servers are functional, just not the
>> connection to csml1.  The "Userfile repcopy failed .." entries in
>> lonnet.log are expected in cases where LON-CAPA is attempting to
>> retrieve a non-existent .meta file (e.g., for a .sequence file in
>> uploaded/$dom/...).
>>
>> If the strange IP address - 138.67.1.77- is not relevant to this issue
>> then I think you may need to set the $DebugLevel initally to 1 in
>> /home/httpd/perl/loncnew on csma1 and then restart loncontrol. Look in
>> lonc.log for the debug mesages, and increment the level by 1 up to a
>> maximum of 8 or 9 (with loncontrol restarts) until you see something
>> useful in lonc.log where a connection is attempted with csml1. and
>> fails.
>>
>> You might also want to set LoadLim to a value less than 1 in
>> /etc/httpd/conf/loncapa.conf so that your server is not hosting
>> sessions from other domains while you attempt to determine why csma1
>> and csml1 can not sustain a TCP/IP connection via port 5663.
>>
>> The only other discrepancy I found is that cmsa1 has an IP of
>> lc1.Mines.EDU according to dns_hosts.tab info retrieved from the
>> LON-CAPA DNS servers at SFU, UIUC and MSU, but it looks as though the
>> IP is lc1.mines.edu in the hosts.tab file on csma1.  It doen't seem
>> that this case difference should have any effect though.
>>
>> Stuart Raeburn
>> MSU LON-CAPA group
>>
>>
>>
>> Quoting Todd Ruskell <truskell at mines.edu>:
>>
>>> Mark,
>>>
>>> Thanks for the ideas.  Totally clean start.  For better or worse, our
>>> server guys are experimenting with VMs running on top of VMWare's ESXi
>>> hypervisor, and they've started the LON-CAPA experiment with this access
>>> server.  As far as I know, that extra layer shouldn't cause this type of
>>> problem, though.
>>>
>>> Once upon a time I had attempted the security certificate setup, and
>>> should likely do that again, but as of right now my servers allow both
>>> secure and insecure inter-server communications.
>>>
>>> Todd
>>>
>>> Mark Lucas wrote:
>>>> Todd,
>>>>
>>>> from a non-guru, did you rebuild totally from scratch, or did you
>>>> restore parts of the /home/httpd directory? (I always start access
>>>> servers totally fresh). Did you change machines underneath?
>>>>
>>>> I've confirmed that I can log in, access resources on oucapa2, but not
>>>> under the csm domain. Do you have security certificates set up that
>>>> might have caused grief?
>>>>
>>>> Mark
>>>>
>>>>
>>>> On Mon, 2008-10-27 at 23:45 -0600, Todd Ruskell wrote:
>>>>> So, this is a strange one for me.  I'd appreciate any advice.
>>>>>
>>>>> We just rebuilt an access server, lc1.mines.edu (csma1) with centOS 5.
>>>>> As far as I can tell, no errors on install, no missing dependencies.
>>>>> loncontrol and httpd start just fine.  I *think* that ports 80, 8080,
>>>>> and 5663 should all be open on both machines.  Our other access
>>>>> server,
>>>>> csma2 still works fine.
>>>>>
>>>>> Problem is, my library server can't find csma1.  I'm also pretty sure
>>>>> that csma1 can't find the library server, either.  I've included
>>>>> relevant output from the access server logs, below, for a
>>>>> representative
>>>>> attempt to log in with a kerberos account.  I can't log in with an
>>>>> internally authenticated account, either.  I don't get anything on the
>>>>> library server logs.
>>>>>
>>>>> Whenever I attempt to log in, I get "Username and/or password could
>>>>> not
>>>>> be authenticated" from my browser.
>>>>>
>>>>> Now, the funny thing is that csma1 is also in spare.tab, and as
>>>>> soon as
>>>>> it came back online it started accepting logins from binghamton (I
>>>>> included a couple lines in lonnet.log to show this), including
>>>>> authorization.  So csma1 will, in fact, talk to some members of the
>>>>> network in some fashion.  Now, I also got a *lot* of "Userfile repcopy
>>>>> failed" messages appearing in lonnet.log.  Perhaps that means that I
>>>>> don't have full communication?
>>>>>
>>>>> But, /home/httpd/html/res is populated by resources, presumably
>>>>> from the
>>>>> chemistry courses the binghamton users were attempting to access.
>>>>>
>>>>> I patiently await the word of the almighty gurus.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Todd
>>>>>
>>>>>
>>>>> lonc.log:
>>>>> Mon Oct 27 22:58:09 2008 (26966) [loncapa.Mines.EDU] [Mon Oct 27
>>>>> 22:29:21 2008: loncapa.Mines.EDU Connection count: 0 Retries
>>>>> remaining:
>>>>> 5 ()] <font color='red'>CRITICAL: Failed to make a connection with
>>>>> lond.</font>
>>>>> Mon Oct 27 22:58:09 2008 (26966) [loncapa.Mines.EDU] [Mon Oct 27
>>>>> 22:29:21 2008: loncapa.Mines.EDU Connection count: 0 Retries
>>>>> remaining:
>>>>> 5 ()] <font color='blue'>WARNING: Failing transaction sethost</font>
>>>>> Mon Oct 27 22:58:12 2008 (26966) [loncapa.Mines.EDU] [Mon Oct 27
>>>>> 22:29:21 2008: loncapa.Mines.EDU Connection count: 0 Retries
>>>>> remaining:
>>>>> 5 ()] <font color='red'>CRITICAL: Failed to make a connection with
>>>>> lond.</font>
>>>>> Mon Oct 27 22:58:16 2008 (26966) [loncapa.Mines.EDU] [Mon Oct 27
>>>>> 22:58:12 2008: loncapa.Mines.EDU Connection count: 0 Retries
>>>>> remaining:
>>>>> 5 ()] <font color='red'>CRITICAL: Failed to make a connection with
>>>>> lond.</font>
>>>>> Mon Oct 27 22:58:16 2008 (26966) [loncapa.Mines.EDU] [Mon Oct 27
>>>>> 22:58:12 2008: loncapa.Mines.EDU Connection count: 0 Retries
>>>>> remaining:
>>>>> 5 ()] <font color='blue'>WARNING: Failing transaction sethost</font>
>>>>>
>>>>> lonhttpd.log:
>>>>> 138.67.129.10 - - [27/Oct/2008:22:58:17 +0000] "GET
>>>>> /adm/lonInterFace/student.jpg HTTP/1.1" 200 32041
>>>>> "http://lc1.mines.edu/adm/authenticate" "
>>>>> Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.17) Gecko/20080924
>>>>> Ubuntu/8.04 (hardy) Firefox/2.0.0.17"
>>>>>
>>>>> lonnet.log
>>>>> Mon Oct 27 14:51:58 2008 (5160): Userfile repcopy failed for
>>>>> uploaded/binghamton/1A2474087a384482fbinghamtonl1/group_allfolders.sequence.meta
>>>>>
>>>>> Mon Oct 27 19:34:26 2008 (27182): User xu7418 at binghamton authorized
>>>>> by binghamtonl1
>>>>> Mon Oct 27 22:58:12 2008 (10233): Trying to reconnect lonc
>>>>> Mon Oct 27 22:58:12 2008 (10233): lonc at pid 26832 responding,  
>>>>> sending USR1
>>>>> Mon Oct 27 22:58:16 2008 (10233): User truskell at csm is unknown in
>>>>> authenticate
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> LON-CAPA-admin mailing list
>>>> LON-CAPA-admin at mail.lon-capa.org
>>>> http://mail.lon-capa.org/mailman/listinfo/lon-capa-admin
>>>
>>> -- 
>>> Dr. Todd Ruskell
>>> Senior Lecturer, Department of Physics       Office:  Meyer Hall 326
>>> Colorado School of Mines                     Phone: 303-384-2080
>>> 1523 Illinois Street                         Fax: 303-273-3919
>>> Golden, CO 80401
>>> _______________________________________________
>>> LON-CAPA-admin mailing list
>>> LON-CAPA-admin at mail.lon-capa.org
>>> http://mail.lon-capa.org/mailman/listinfo/lon-capa-admin
>>>
> 
> 
> 
> _______________________________________________
> LON-CAPA-admin mailing list
> LON-CAPA-admin at mail.lon-capa.org
> http://mail.lon-capa.org/mailman/listinfo/lon-capa-admin

-- 
Dr. Todd Ruskell
Senior Lecturer, Department of Physics       Office:  Meyer Hall 326
Colorado School of Mines                     Phone: 303-384-2080
1523 Illinois Street                         Fax: 303-273-3919
Golden, CO 80401



More information about the LON-CAPA-admin mailing list