[LON-CAPA-admin] lonbalancer

H. K. Ng hkng at fsu.edu
Fri Jun 1 09:48:48 EDT 2007


Hi Guy,

> > In the past few days, I have complaints that students were able to
> > login to the lonbalancer server but got stuck on the "switch server"
> > page. Checking the lonnet.log file, there are lots of entries for a
> > given user showing something like this
> >
> > Wed May 30 13:50:24 2007 (27873): SSO authorized user bmp06d
> > Wed May 30 13:50:24 2007 (27873): Flushing log buffers
> > Wed May 30 13:50:25 2007 (27869): SSO authorized user bmp06d
> > Wed May 30 13:50:25 2007 (27869): Flushing log buffers
> > Wed May 30 13:50:26 2007 (27868): SSO authorized user bmp06d
>
> >
> > The user activity file shows that at the same time the lonbalancer is
> > trying to hand the session over to another server but that server is
> > not accepting the service - as far as I can tell. (See sample entries
> > below from the activity file.)
> >
> > Wed May 30, 2007 13:50:24 - 1180547424:fsua0:Switch Server to fsua1
> > with role  128.186.24.44
> > Wed May 30, 2007 13:50:25 - 1180547425:fsua0:Switch Server to fsua1
> > with role  128.186.24.44
>
>That's weird. My guess is something malconfigured in some way.

There is only one hint that somewhere in the configurations there is 
a longcapa1.fsu.edu instead of loncapa1.fsu.edu (the server was setup 
by the technical support shop). This appears when I ssh into the 
server and I get the following message.

reverse mapping checking getaddrinfo for longcapa1.fsu.edu failed - 
POSSIBLE BREAK-IN ATTEMPT!

I have checked the following files - ifcfg-eth*, network, hosts, 
sshd_config, ssh_config, all files under lonTab but cannot find any 
longcapa1.fsu.edu entry - it is driving me nuts!! Having said that, 
the message does not seem to affect hosting session - and I can login 
directly to the server and it works fine.


>switchserver doen't have a retry mechanism behind it. All it does it
>send through lonc/d the neccessary login credentials and gets back a
>token for those credentials. Switchserver then generates a redirect
>webpage to the switched to host. Which should see the token check the
>intrenally stored credentials and log the user in.
>
>Hmmm, looking at your setup I guess if the credentials fails in some
>way so the user can't actaully get logged into the new server, they
>should end up at the lon-capa login screen (migrateuser redirects to
>/adm/login on failure)
>
>Can you track down more info from the logs releated to the above
>event.

The only other entries are in the CAS.log which has the following 
entries under the user.

Service = 'http://loncapa.fsu.edu/adm/logout'; ticket = '(null)'
Service = 'http://loncapa.fsu.edu/adm/logout'; ticket = 
'ST-50520-Vv8l4q2gf6IscsaTyHYD'
Successful primary authentication for bmp06d
Wed May 30 13:50:24 2007

Service = 'http://loncapa.fsu.edu/adm/logout'; ticket = '(null)'
Service = 'http://loncapa.fsu.edu/adm/logout'; ticket = 
'ST-50525-9HIbdLMksfwdtTKyJlYr'
Successful primary authentication for bmp06d
Wed May 30 13:50:25 2007

Service = 'http://loncapa.fsu.edu/adm/logout'; ticket = '(null)'
Service = 'http://loncapa.fsu.edu/adm/logout'; ticket = 
'ST-50528-KxP70LA5uvRjB8FTRoic'
Successful primary authentication for bmp06d
Wed May 30 13:50:26 2007

Service = 'http://loncapa.fsu.edu/adm/logout'; ticket = 
'ST-50528-KxP70LA5uvRjB8FTRoic'
Service = 'http://loncapa.fsu.edu/adm/logout'; ticket = 
'ST-50534-DqBqj6sd1TrXC0n2fx7G'
Successful primary authentication for bmp06d
Wed May 30 13:50:27 2007

Service = 'http://loncapa.fsu.edu/adm/logout'; ticket = 
'ST-50534-DqBqj6sd1TrXC0n2fx7G'
Service = 'http://loncapa.fsu.edu/adm/logout'; ticket = 
'ST-50541-FOU81huddpN6bAtaX5n5'
Successful primary authentication for bmp06d
Wed May 30 13:50:29 2007

Service = 'http://loncapa.fsu.edu/adm/menu'; ticket = '(null)'
Service = 'http://loncapa.fsu.edu/adm/menu'; ticket = 
'ST-50547-FfJEUOO8TUDvK3ll9zcg'
Successful primary authentication for bmp06d
Wed May 30 13:50:31 2007



>What machines did they visit and what URLs on those machines.

I cannot find any log entries - check the usual suspects, everything 
under perl/logs/.


>Did they really get sent to loncapa1.fsu.edu which somehow bounced
>them back in some way?

It seems that way from the logs. The spare.tab file on 
loncapa1.fsu.edu does not contain fsua0 (the lonbalancer server) so 
it is not fsua1 being busy and sending it back to fsua0 ... Not sure 
how else to check this - since it does not happen all the time??


> > Any idea how to solve this?
>
>I'm not sure what's failing yet so no idea yet....
>
> > I removed fsua1 from the spare.tab and
> > restart loncontrol but I think it is httpd that I need to restart.
>
>Yes. restart the webserver.
>
> > Is there a way for lonbalancer to hand the session to a third server
> > after say, 3 unsuccessful tries?
>
>Shouldn't ever be more than 1 try...

Thanks,
-hk




More information about the LON-CAPA-admin mailing list