[LON-CAPA-admin] lon-balancer
Stuart Raeburn
raeburn at msu.edu
Thu Oct 8 09:09:17 EDT 2015
Maged,
The &check_loadbalancing() routine in
/home/httpd/lib/perl/Apache/lonnet.pm is where the decision is made
about which server will be sent a newly logged-in user. That routine
loops over the designated servers to try to determine the which has
the lowest load. The &compare_server_load() routine is the one which
actually retrieves load data from each server.
There are two load types which may be considered when determining a
server's current load: the load average and the user load. The load
average is the first value in /proc/loadavg (i.e., the 5 minute
average), whereas the user load is the number of users with last
modification times for their files in /home/httpd/lonIDs within the
last 30 minutes (excluding "public" users).
These load average and user load values are converted to load percent
and userload percent respectively by multiplying by 100 and dividing
by the perl vars set for lonLoadLim and lonUserLoadLim (unless zero)
in /etc/httpd/conf/loncapa.conf (CentOS/RHEL/Scientific Linux) or
/etc/apache2/loncapa.conf (Ubuntu). If lonUserLoadLim is zero then
the userload percent is not considered.
The determination of least loaded server is made by looping over the
servers defined for "primary" destinations, and then, if, and only if,
none of those are at less than 100% load, looping over servers defined
for "default" destinations.
The server with the lowest load percent will be the one selected. (If
both load percent and userload percent are being considered, the load
value compared with values from other servers is the larger of the two).
The simplest set up is to have all access servers in your domain
(excluding the lonbalancer itself) identified as "primary", and have
the same LoadLim defined for each (the default is 2.0).
One way to test this from the command line, would be to ssh into each
of your access servers, and then use:
cat /proc/loadavg
in each server, immediately before logging into the load-balancer.
You could then verify that your session was transferred to the access
server with the lowest load percent. This would also be the access
server with the lowest 5 minute average from /proc/loadavg if all
access servers have the same lonLoadLim, and lonUserLoadLim is zero.
If you wanted to see how load balancing is behaving over a period of
time you could use the monitoring script -- monitoring.pl -- available
from the modules/raeburn directory in a check out of CVS (Lee Bynum at
Illinois has this script). If you wanted to use that with SSO, you
might need to customize the robotic log-in to work with Shibboleth;
the file as it exists currently works with MSU's CAS-like Sentinel
SSO. However, you can also set it to use /adm/login (i.e., not use
SSO).
That script has been used at MSU for over a decade, with a robotic
log-in to the load balancer every 5 minutes. One piece of data
recorded by that script is the lonHostID of the access server to which
the robotic user's session is transferred.
The percentages of all offloads to each of the four access servers
recorded for 2015 are:
30%
29%
24%
17%
Somewhat surprisingly these are not all 25%.
However, real users can log-in directly to each of the MSU access
servers currently, i.e., nothing is currently set in the "Login page
requests redirected" configuration for the msu domain in "Set domain
configuration" > "Log-in page options" so it is possible that users
are directly logging in to a particular machine (and raising the load
on that, on average).
I also use MRTG to display data on user load and load average gathered
from all msu access servers using scripts which are run every 5
minutes. Those load values show some small (~ 5%) variations in load
between servers.
If you wanted to record the load values that
lonnet::compare_server_load() is working with each time it is called
you could either add some &logthis() calls (to log to
/home/httpd/perl/logs/lonnet.log) or STDERR calls (to log to your
Apache error log) to the compare_server_load() routine to capture
values for $load and $try_server.
> Is there a way of checking if the balancer working right? And fixing
> it, if not?
You can manipulate the behavior of the balancer by setting different
values for lonLoadLim on each of the access servers, and/or by setting
different values for lonUserLoadLim.
Stuart Raeburn
LON-CAPA Academic Consortium
Quoting "Abdel Messeh, Maged" <mmesseh at illinois.edu>:
> Hi All,
>
> I have recently noticed that we have 4 or 5 times more requests
> going to one of our access nodes than the other 3.
>
> Is there a way of checking if the balancer working right? And fixing
> it, if not?
>
> Many thanks,
>
> Maged
>
>
> --------------------------
> Maged Messeh, Ph.D.
> College of Liberal Arts & Sciences Infrastructure
> University of Illinois at Urbana-Champaign
>
>
> _______________________________________________
> LON-CAPA-admin mailing list
> LON-CAPA-admin at mail.lon-capa.org
> http://mail.lon-capa.org/mailman/listinfo/lon-capa-admin
More information about the LON-CAPA-admin
mailing list