[LON-CAPA-admin] [Fwd: Re: LON-CAPA clustering]

Mark Lucas lucasm at ohio.edu
Thu Sep 25 22:14:51 EDT 2008


Glen MacLachlan wrote:
> Dear Mark--
>
> I have been thinking about setting up a LON CAPA cluster for GWU and I've
> done some googling and read the admin help pages,
> http://help.loncapa.org/cgi-bin/fom?_recurse=1&file=1
>
> but I'm wondering if there is any actual documentation on the topic. I
> haven't done an exhaustive search so I apologize if it is highly visible
> but I just missed it. Are there any pitfalls or special requirements? My
> primary goal is redundancy and it seemed this would be the best way to
> achieve that. What does the system at OU look like? What happens if your
> machine goes down?
>
> Regards, Glen
Glen,

I'm going to reply here on loncapa-admin so that others can correct me 
and add to my answers.

I do not know of any real manual regarding the creating of multiple 
machines in a domain. You'll find bits of accumulated wisdom throughout 
the loncapa-admin mailing list.

As you probably know, a machine is either a library server or access 
server. The library server holds all of the definitive records for a 
domain: student records, published work, course information. If your 
library server goes down, the best you can do is just get it up as fast 
as possible. I do not know of anything simple one can do to run a quick 
failover mode here. There may be some fancy solutions with a Networked 
attached storage and a second machine. Stuart gets to weigh in here with 
any ideas 8)
My solution is to have two machines that are configured in the exact 
same way. If oucapa2 goes belly up, ohioua6 (capa10) has the exact same 
hardware. As long as the disks are okay, I can plug them into capa10 and 
turn it into capa2 with not too much hassle. It's a lot easier to find a 
smaller machine to take the place of capa10.

Access servers are a 'generic install'. There is no permanent data 
stored on an access server. If one of my access servers goes down for 
good, I'm fond of saying I just dropkick it across the room and can set 
up another server fairly quickly.

Typically a domain will have one library server. If there is a load 
problem, the accepted policy is to bring in two access servers as the 
next step. Pulling in one access server and continuing to have students 
log in to the library server as well isn't really recommended as the 
library server gets bogged down and doesn't do that good of a job 
serving. You don't really gain that much supposedly.

If you are having load problems, there are several things you can do:

* If it's a periodic peak load issue, you can make sure your spare.tab 
has access to access servers from other domains. In a pinch, students 
should be able to log into any access server on the network and work on 
their course from GWU. MSU places one or two of their machines in the 
default spare.tab as does Colorado School of Mines. An access server 
from Ohio University could also be placed in there. You may want to 
check with the domain coordinator as a courtesy. You also want to be a 
little careful to make sure the machines (and your machine) are fairly 
up to date.

* If it's a significant enough load issue, you can get a couple of 
access servers.

One simple way to load balance is to have the local DNS table set up a 
round robin on some alias. I typically ask students to log into 
http://loncapa.phy.ohiou.edu. Until just this summer, I would then have 
this alias point to capa7, capa8 and capa10. Each of these will spill 
over to capa6 as a spare. During summers when the load is lighter and I 
might be working on the access servers, I can point the alias to my 
library server and let everyone use that one machine. (Just beware that 
students, even if told not to, will bookmark individual machines).

This summer we took the dive and set up a balancing machine. MSU and FSU 
use a balancer. You can dedicate a fairly small machine whose sole task 
is to act as a gateway. It runs a complete LON-CAPA installation, but it 
simply authenticates, determines what machines are available, and 
switches the students to one of these machines. Right now I'm holding my 
breath and using an old Poweredge 1550 - 800ish MHz with only 512 Mb (I 
should really have more memory. Even though it is lobotomized on this 
machine, LON-CAPA wants to start up the maxima servers, etc.... Is there 
any way to keep this down?). So far so good.

There are a couple of advantages. A round robin simply rolls through the 
list regardless of the actual load. Once a student is sent to a 
particular machine, they stay on it. Lonbalancer at least pays attention 
to the load on the machine and is more intelligent about which machine 
is switches people to. With DNS round robin, if one of three machines is 
down or LON-CAPA is not working on that one machine, a third of the 
students don't get logged in. With the balancer, students are never sent 
to that machine. Additionally, you have control over the spare table on 
the balancer. I doubt any of us have direct control over our own DNS 
information. It also takes a day or two for all the DNS servers to 
refresh their caches in the outside world.

So, in answer to your question about redundancy, I'm not sure there's 
much to help a failure of your library server besides making sure it has 
RAID and perhaps other redundancies. Be prepared to roll it over to 
another machine in an emergency.

Adding access servers will help if one of your access servers go down. 
The safest thing is to set up a balancing machine, which will make 
failures in an access server mostly transparent to the students (unless 
they were logged into the server at the time). So my suggestion would be 
to add two access servers and a third, low power machine as a balancer. 
You may want to make sure one of the access servers could fairly quickly 
turn into your library server in the case of an emergency.

Does this help, Glen?

Anyone else have words of advice about building up redundancy and capacity?


Later,
Mark
Ohio University






More information about the LON-CAPA-admin mailing list