[LON-CAPA-admin] Access Server Problems

Stuart Raeburn raeburn at msu.edu
Thu Oct 30 19:44:46 EDT 2008


Todd,

I can now browse /res/csm when logged into http://lc1.mines.edu so you  
were evidently able to resolve the connection issue between your  
access server and the csm library server.

Stuart Raeburn
MSU LON-CAPA group

Quoting Stuart Raeburn <raeburn at msu.edu>:

> Todd,
>
> My guess is that within your internal network, lc1.mines.edu appears to
> have an IP address of 138.67.1.77 (host 138.67.1.77 reports:
> mica.Mines.EDU) when attempting to set up a connection, whereas outside
> the network it is mapping to 138.67.38.59 (which is what host
> lc1.mines.edu says it should be).
>
> Perhaps this is a consequence of the configuration for your virtualization?
> I say this because http://lc2.mines.edu/lon-status/ includes the
> following from lond.log
>
> Wed Oct 29 05:00:50 2008 (6782): WARNING: Rejected client 138.67.1.77,
> closing connection Wed Oct 29 05:00:50 2008 (6782): CRITICAL:
> Disconnect from 138.67.1.77 () Wed Oct 29 05:00:50 2008 (9055): Child
> 6782 died Wed Oct 29 05:02:50 2008 (9055):  Attempting to start child
> (IO::Socket::INET=GLOB(0x8d3248c)) Wed Oct 29 05:02:50 2008 (6785):
> WARNING: Unknown client 138.67.1.77 Wed Oct 29 05:02:50 2008 (6785):
> WARNING: Rejected client 138.67.1.77, closing connection Wed Oct 29
> 05:02:50 2008 (6785): CRITICAL: Disconnect from 138.67.1.77 () Wed Oct
> 29 05:02:50 2008 (9055): Child 6785 died Wed Oct 29 05:04:50 2008
> (9055):  Attempting to start child \CRITICAL: Disconnect from
> 138.67.1.77 () Wed Oct 29 05:08:50 2008 (9055): Unable to determine who
> caller was, getpeername returned nothing Wed Oct 29 05:08:50 2008
> (9055): Unable to determine clientip Wed Oct 29 05:08:50 2008 (9055):
> Child 8939 died
>
> http://loncapa.mines.edu/lon-status/ has the same in lond.log from
> September 18th (the most recent file).
>
> Is loncron not being run nightly by cron at 5.10 am on your library
> server: loncapa.mines.edu? The date at the top of this lon-status page
> reads:
>
> LON Status Report csml1
> Thu Sep 18 05:10:58 2008
>
> By contrast http://lc1.mines.edu/lon-status/ reports connections to
> servers in other domains (binghamton, uiuc) for October 29.
>
> I verified that I could login to csma1 with a username from the msu
> domain, and view resources in the msu domain, so it seems that
> connections to/from other servers are functional, just not the
> connection to csml1.  The "Userfile repcopy failed .." entries in
> lonnet.log are expected in cases where LON-CAPA is attempting to
> retrieve a non-existent .meta file (e.g., for a .sequence file in
> uploaded/$dom/...).
>
> If the strange IP address - 138.67.1.77- is not relevant to this issue
> then I think you may need to set the $DebugLevel initally to 1 in
> /home/httpd/perl/loncnew on csma1 and then restart loncontrol. Look in
> lonc.log for the debug mesages, and increment the level by 1 up to a
> maximum of 8 or 9 (with loncontrol restarts) until you see something
> useful in lonc.log where a connection is attempted with csml1. and
> fails.
>
> You might also want to set LoadLim to a value less than 1 in
> /etc/httpd/conf/loncapa.conf so that your server is not hosting
> sessions from other domains while you attempt to determine why csma1
> and csml1 can not sustain a TCP/IP connection via port 5663.
>
> The only other discrepancy I found is that cmsa1 has an IP of
> lc1.Mines.EDU according to dns_hosts.tab info retrieved from the
> LON-CAPA DNS servers at SFU, UIUC and MSU, but it looks as though the
> IP is lc1.mines.edu in the hosts.tab file on csma1.  It doen't seem
> that this case difference should have any effect though.
>
> Stuart Raeburn
> MSU LON-CAPA group
>
>
>
> Quoting Todd Ruskell <truskell at mines.edu>:
>
>> Mark,
>>
>> Thanks for the ideas.  Totally clean start.  For better or worse, our
>> server guys are experimenting with VMs running on top of VMWare's ESXi
>> hypervisor, and they've started the LON-CAPA experiment with this access
>> server.  As far as I know, that extra layer shouldn't cause this type of
>> problem, though.
>>
>> Once upon a time I had attempted the security certificate setup, and
>> should likely do that again, but as of right now my servers allow both
>> secure and insecure inter-server communications.
>>
>> Todd
>>
>> Mark Lucas wrote:
>>> Todd,
>>>
>>> from a non-guru, did you rebuild totally from scratch, or did you
>>> restore parts of the /home/httpd directory? (I always start access
>>> servers totally fresh). Did you change machines underneath?
>>>
>>> I've confirmed that I can log in, access resources on oucapa2, but not
>>> under the csm domain. Do you have security certificates set up that
>>> might have caused grief?
>>>
>>> Mark
>>>
>>>
>>> On Mon, 2008-10-27 at 23:45 -0600, Todd Ruskell wrote:
>>>> So, this is a strange one for me.  I'd appreciate any advice.
>>>>
>>>> We just rebuilt an access server, lc1.mines.edu (csma1) with centOS 5.
>>>> As far as I can tell, no errors on install, no missing dependencies.
>>>> loncontrol and httpd start just fine.  I *think* that ports 80, 8080,
>>>> and 5663 should all be open on both machines.  Our other access server,
>>>> csma2 still works fine.
>>>>
>>>> Problem is, my library server can't find csma1.  I'm also pretty sure
>>>> that csma1 can't find the library server, either.  I've included
>>>> relevant output from the access server logs, below, for a representative
>>>> attempt to log in with a kerberos account.  I can't log in with an
>>>> internally authenticated account, either.  I don't get anything on the
>>>> library server logs.
>>>>
>>>> Whenever I attempt to log in, I get "Username and/or password could not
>>>> be authenticated" from my browser.
>>>>
>>>> Now, the funny thing is that csma1 is also in spare.tab, and as soon as
>>>> it came back online it started accepting logins from binghamton (I
>>>> included a couple lines in lonnet.log to show this), including
>>>> authorization.  So csma1 will, in fact, talk to some members of the
>>>> network in some fashion.  Now, I also got a *lot* of "Userfile repcopy
>>>> failed" messages appearing in lonnet.log.  Perhaps that means that I
>>>> don't have full communication?
>>>>
>>>> But, /home/httpd/html/res is populated by resources, presumably from the
>>>> chemistry courses the binghamton users were attempting to access.
>>>>
>>>> I patiently await the word of the almighty gurus.
>>>>
>>>> Thanks,
>>>>
>>>> Todd
>>>>
>>>>
>>>> lonc.log:
>>>> Mon Oct 27 22:58:09 2008 (26966) [loncapa.Mines.EDU] [Mon Oct 27
>>>> 22:29:21 2008: loncapa.Mines.EDU Connection count: 0 Retries remaining:
>>>> 5 ()] <font color='red'>CRITICAL: Failed to make a connection with
>>>> lond.</font>
>>>> Mon Oct 27 22:58:09 2008 (26966) [loncapa.Mines.EDU] [Mon Oct 27
>>>> 22:29:21 2008: loncapa.Mines.EDU Connection count: 0 Retries remaining:
>>>> 5 ()] <font color='blue'>WARNING: Failing transaction sethost</font>
>>>> Mon Oct 27 22:58:12 2008 (26966) [loncapa.Mines.EDU] [Mon Oct 27
>>>> 22:29:21 2008: loncapa.Mines.EDU Connection count: 0 Retries remaining:
>>>> 5 ()] <font color='red'>CRITICAL: Failed to make a connection with
>>>> lond.</font>
>>>> Mon Oct 27 22:58:16 2008 (26966) [loncapa.Mines.EDU] [Mon Oct 27
>>>> 22:58:12 2008: loncapa.Mines.EDU Connection count: 0 Retries remaining:
>>>> 5 ()] <font color='red'>CRITICAL: Failed to make a connection with
>>>> lond.</font>
>>>> Mon Oct 27 22:58:16 2008 (26966) [loncapa.Mines.EDU] [Mon Oct 27
>>>> 22:58:12 2008: loncapa.Mines.EDU Connection count: 0 Retries remaining:
>>>> 5 ()] <font color='blue'>WARNING: Failing transaction sethost</font>
>>>>
>>>> lonhttpd.log:
>>>> 138.67.129.10 - - [27/Oct/2008:22:58:17 +0000] "GET
>>>> /adm/lonInterFace/student.jpg HTTP/1.1" 200 32041
>>>> "http://lc1.mines.edu/adm/authenticate" "
>>>> Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.17) Gecko/20080924
>>>> Ubuntu/8.04 (hardy) Firefox/2.0.0.17"
>>>>
>>>> lonnet.log
>>>> Mon Oct 27 14:51:58 2008 (5160): Userfile repcopy failed for
>>>> uploaded/binghamton/1A2474087a384482fbinghamtonl1/group_allfolders.sequence.meta
>>>> Mon Oct 27 19:34:26 2008 (27182): User xu7418 at binghamton authorized
>>>> by binghamtonl1
>>>> Mon Oct 27 22:58:12 2008 (10233): Trying to reconnect lonc
>>>> Mon Oct 27 22:58:12 2008 (10233): lonc at pid 26832 responding,    
>>>> sending USR1
>>>> Mon Oct 27 22:58:16 2008 (10233): User truskell at csm is unknown in
>>>> authenticate
>>>>
>>>>
>>>
>>> _______________________________________________
>>> LON-CAPA-admin mailing list
>>> LON-CAPA-admin at mail.lon-capa.org
>>> http://mail.lon-capa.org/mailman/listinfo/lon-capa-admin
>>
>> --
>> Dr. Todd Ruskell
>> Senior Lecturer, Department of Physics       Office:  Meyer Hall 326
>> Colorado School of Mines                     Phone: 303-384-2080
>> 1523 Illinois Street                         Fax: 303-273-3919
>> Golden, CO 80401
>> _______________________________________________
>> LON-CAPA-admin mailing list
>> LON-CAPA-admin at mail.lon-capa.org
>> http://mail.lon-capa.org/mailman/listinfo/lon-capa-admin
>>






More information about the LON-CAPA-admin mailing list