[LON-CAPA-admin] Slowness in LonCapa

Mon Jun 22 13:47:23 EDT 2015

Hi Maged,

> Two days ago we got some reports about slowness in LonCapa

> 1. The discussion interface where we are seeing very long duration   
> associated with both discussion posts and requests

I switched a session to the library server for the uiuc domain, and  
submitted a discussion post to a resource in one of my courses, and  
did not find that it was slow.  Is this an issue on all servers in  
your domain (including the library server)?

> 2. Publishing, although not for all users.  The one example I have,   
> is that a power user made minor edits on 11 or 12 resources and   
> republished them,

For each publication event I would look at the web server log to find  
the date/time recorded for processing of the "Phase Two" call to  
/adm/publish (i.e., after the "Finalize Publication" button was  
pressed), and the date/time recorded in the .log file in the Authoring  
Space in /home/httpd/html/priv/domain/user/ for the particular  
resource within the heading:

================= Publish ||date/time|| Phase Two  ================

There is an $r->rflush call to send output to the client after  
completion of the following items (logged in the .log file):

Write metadata file for ||filename||
Wrote metadata
Synchronized SQL metadata database
Removing error messages: ok
Creating old version ||number||
Copied old target to ||path||
Copied old target metadata to ||versioned metadata path||
Copied original source to ||resource path||
Copied original metadata to ||metadata path||

The corresponding output sent to the browser would end with:
Copied metadata

Thereafter, actions are added to the PerlCleanupHandler phase to  
notify subscribed servers.

That phase occurs after logging to the Apache web server logs, and  
after return of the response to the browser, so it should not factor  
into the time delays reported.  However, you might also look at the  
last modification time recorded for the .log file itself (which would  
be when the last subscription update response was written to the .log  
file).

> root at library1:/home/httpd/perl/logs# grep "CRITICAL: Forking server   
> for s10.lite.msu.edu" lonc.log
>
> 23 times from around 10am till 5pm

There is nothing unusual in that. It tells me that your particular  
server needed to make a connection to the msu library server to  
request data.  This could be for a number of reasons, including  
display of the Roles page by someone with a web session on your  
server, who has one or more roles in the msu domain, or browsing the  
msu resource space in the cross-institutional content repository.

After the CRITICAL: Forking server for s10.lite.msu.edu you should see:

SUCCESS: Created connection 1 to host s10.lite.msu.edu
INFO: Connected to lond version: 489
SUCCESS: Connection 1 to s10.lite.msu.edu now ready for action

> I am assuming that the balancer should have a lot less needs for   
> memory and CPU? As it only directs traffic to one of the access   
> servers?

Correct, the LON-CAPA balancer requires a relatively small amount of  
memory and CPUs, since it will switch a session for an authenticated  
user to another server, as determined by the configuration in your  
domain.

At MSU, sessions for faculty and users with author/co-author roles in  
the msu domain are switched to the library server (s10) whereas other  
users are switched from the balancer to the least busy of the four  
access servers in the msu domain.

Stuart Raeburn
LON-CAPA Academic Consortium

Quoting "Abdel Messeh, Maged" <mmesseh at illinois.edu>:

> Hi Stuart,
>
> Two days ago we got some reports about slowness in LonCapa, I   
> checked the server resources and nothing looked out of the normal.
>
> This slowness was exhibited by two behaviors:
>
> 1. The discussion interface where we are seeing very long duration   
> associated with both discussion posts and requests (this behavior   
> apparently existed before we upgraded to 2.11.1 and deployed our   
> lonBalancer).
> 2. Publishing, although not for all users.  The one example I have,   
> is that a power user made minor edits on 11 or 12 resources and   
> republished them, here are the times that he got from the time he   
> hit "finalize publication" to the time the return page showed up   
> (all in seconds):
>
> 40
> 40
> 130
> 90
> 15
> 15
> 15
> 8
> 90
> 15
> 12
>
> While there is of course some variation in complexity of the prelabs  
>  he was publishing, we probably should not see that much variation  
> in  times.  The problem published was probably as complicated as any  
> of  the others.
>
> Looking through the logs I noticed several times with:
>
> root at library1:/home/httpd/perl/logs# grep "CRITICAL: Forking server   
> for s10.lite.msu.edu" lonc.log
>
> 23 times from around 10am till 5pm
>
> I would appriocate any insights for where I can look for the cause   
> of this problem.
>
>
> Also on the resource allocation, we have:
> Access servers - 12G RAM and 4vCPUs
> Library server - 10G RAM and 8vCPUs
>
> I am assuming that the balancer should have a lot less needs for   
> memory and CPU? As it only directs traffic to one of the access   
> servers?
>
> Many thanks,
>
> Maged
>
>
> _______________________________________________
> LON-CAPA-admin mailing list
> LON-CAPA-admin at mail.lon-capa.org
> http://mail.lon-capa.org/mailman/listinfo/lon-capa-admin