[LON-CAPA-admin] [Fwd: Re: LON-CAPA clustering]
Mark Lucas
lucasm at ohio.edu
Thu Sep 25 22:14:51 EDT 2008
Glen MacLachlan wrote:
> Dear Mark--
>
> I have been thinking about setting up a LON CAPA cluster for GWU and I've
> done some googling and read the admin help pages,
> http://help.loncapa.org/cgi-bin/fom?_recurse=1&file=1
>
> but I'm wondering if there is any actual documentation on the topic. I
> haven't done an exhaustive search so I apologize if it is highly visible
> but I just missed it. Are there any pitfalls or special requirements? My
> primary goal is redundancy and it seemed this would be the best way to
> achieve that. What does the system at OU look like? What happens if your
> machine goes down?
>
> Regards, Glen
Glen,
I'm going to reply here on loncapa-admin so that others can correct me
and add to my answers.
I do not know of any real manual regarding the creating of multiple
machines in a domain. You'll find bits of accumulated wisdom throughout
the loncapa-admin mailing list.
As you probably know, a machine is either a library server or access
server. The library server holds all of the definitive records for a
domain: student records, published work, course information. If your
library server goes down, the best you can do is just get it up as fast
as possible. I do not know of anything simple one can do to run a quick
failover mode here. There may be some fancy solutions with a Networked
attached storage and a second machine. Stuart gets to weigh in here with
any ideas 8)
My solution is to have two machines that are configured in the exact
same way. If oucapa2 goes belly up, ohioua6 (capa10) has the exact same
hardware. As long as the disks are okay, I can plug them into capa10 and
turn it into capa2 with not too much hassle. It's a lot easier to find a
smaller machine to take the place of capa10.
Access servers are a 'generic install'. There is no permanent data
stored on an access server. If one of my access servers goes down for
good, I'm fond of saying I just dropkick it across the room and can set
up another server fairly quickly.
Typically a domain will have one library server. If there is a load
problem, the accepted policy is to bring in two access servers as the
next step. Pulling in one access server and continuing to have students
log in to the library server as well isn't really recommended as the
library server gets bogged down and doesn't do that good of a job
serving. You don't really gain that much supposedly.
If you are having load problems, there are several things you can do:
* If it's a periodic peak load issue, you can make sure your spare.tab
has access to access servers from other domains. In a pinch, students
should be able to log into any access server on the network and work on
their course from GWU. MSU places one or two of their machines in the
default spare.tab as does Colorado School of Mines. An access server
from Ohio University could also be placed in there. You may want to
check with the domain coordinator as a courtesy. You also want to be a
little careful to make sure the machines (and your machine) are fairly
up to date.
* If it's a significant enough load issue, you can get a couple of
access servers.
One simple way to load balance is to have the local DNS table set up a
round robin on some alias. I typically ask students to log into
http://loncapa.phy.ohiou.edu. Until just this summer, I would then have
this alias point to capa7, capa8 and capa10. Each of these will spill
over to capa6 as a spare. During summers when the load is lighter and I
might be working on the access servers, I can point the alias to my
library server and let everyone use that one machine. (Just beware that
students, even if told not to, will bookmark individual machines).
This summer we took the dive and set up a balancing machine. MSU and FSU
use a balancer. You can dedicate a fairly small machine whose sole task
is to act as a gateway. It runs a complete LON-CAPA installation, but it
simply authenticates, determines what machines are available, and
switches the students to one of these machines. Right now I'm holding my
breath and using an old Poweredge 1550 - 800ish MHz with only 512 Mb (I
should really have more memory. Even though it is lobotomized on this
machine, LON-CAPA wants to start up the maxima servers, etc.... Is there
any way to keep this down?). So far so good.
There are a couple of advantages. A round robin simply rolls through the
list regardless of the actual load. Once a student is sent to a
particular machine, they stay on it. Lonbalancer at least pays attention
to the load on the machine and is more intelligent about which machine
is switches people to. With DNS round robin, if one of three machines is
down or LON-CAPA is not working on that one machine, a third of the
students don't get logged in. With the balancer, students are never sent
to that machine. Additionally, you have control over the spare table on
the balancer. I doubt any of us have direct control over our own DNS
information. It also takes a day or two for all the DNS servers to
refresh their caches in the outside world.
So, in answer to your question about redundancy, I'm not sure there's
much to help a failure of your library server besides making sure it has
RAID and perhaps other redundancies. Be prepared to roll it over to
another machine in an emergency.
Adding access servers will help if one of your access servers go down.
The safest thing is to set up a balancing machine, which will make
failures in an access server mostly transparent to the students (unless
they were logged into the server at the time). So my suggestion would be
to add two access servers and a third, low power machine as a balancer.
You may want to make sure one of the access servers could fairly quickly
turn into your library server in the case of an emergency.
Does this help, Glen?
Anyone else have words of advice about building up redundancy and capacity?
Later,
Mark
Ohio University
More information about the LON-CAPA-admin
mailing list