[LON-CAPA-dev] Running the 600 student quiz

Gerd Kortemeyer lon-capa-dev@mail.lon-capa.org
Mon, 16 Sep 2002 13:06:04 -0400


Hi,

Well ... it has been a rough ride running the 600 student quiz. The
table below shows the percentage server load on the LON-CAPA machines,
and you might be able to catch the network stats at
http://mrtg.cl.msu.edu/mrtg/campus/kedzie-rtr.2-day.png and
http://mrtg.cl.msu.edu/mrtg/campus/kedzie-rtr.2-week.png

time s1 s2 s3 s4 s10
9:55 10.5 3.5 4.5 0 6.5
10:01 1 4.5 3 0 3.5
10:08 2 20 14 0 0.5
10:15 7.5 10.5 9.5 0 7
10:30 4800 290 1849 342 87
10:45 422 342 824 9.5 1007
11:00 5847 4827 6112 447 92.5
11:05 4963 45 113 104 124
11:15 17 363 27 305 2749
11:20 3.5 34.5 11.5 214 737
11:25 67.5 606 158 2130 861
11:35 25 417 43 10 13.5
11:40 27 22 37.5 11.5 18
11:50 38.5 31.5 1477 8 10.5
12:00 3400 43.5 48 33.5 80
12:05 296.5 32 24.5 16 51.5
12:12 18 29 9.5 13.5 37
12:19 6 23.5 13 2 51.5


We were clearly on overload at certain times. From today's test, these
would be my conclusions:

Immediate future:

* change the spare.tab entries on s1 through s3 to
   - include themselves (s1...s3) on all
   - exclude the library server (s10)
   - include the demo and CBI machine, as well as both access servers in
VU

* change spare.tab on s4 to only point to itself

* request to all instructors to not check grades while the exam is
running. Some large load peaks resulted from checking on the grades and
pulling up the complete course chart - it's like quantum mechanics, the
measurement changes the outcome. We had lots of people checking "how the
class is doing".

* request to students not to keep hitting "reload" if server seems slow.
This is a big problem!!! The server will only spawn a new additional
request, and the problem grows. Ask the students to be patient.

* check on weird packages (weird requests from hosts at attbi.com during
quiz) showed up on server console screen

Near future:

Have resource intensive operations (statistics, chart, spreadsheet,
search) blocked in overload situations (send error 413 and have
retry-after determined by load)

Thanks to all who kept an eye on the machines and answered email and
telephone, especially Felicia.

- Gerd.