[LON-CAPA-admin] Only One Access Server Serving Maps?

Juan Eduardo Ramirez eduardo at charma.uprm.edu
Sat Aug 22 13:27:45 EDT 2009


Hi,
I am having similar problem in our access server. (The library is fine). 
Then I upgraded from 2.8.0 to 2.8.1 but still the problem persist.
I did not notice it until last night, when my students were complaining 
about "unable to find xxx.problem" for several problems from msu but I 
could not see it until I logged in  to the access server instead of 
library server.
This seems to be random, some folders show all problems and others does not.
Any hint? Why is failing copying if before was woking fine?
Eduardo
PS. I checked we have selinux disabled

Below is an extract of my /home/httpd/perl/logs/lonnet.log
Sat Aug 22 12:34:29 2009 (9824): caching of id -> 
dns%3a%2fadm%2fdns%2fdomain  failed
Sat Aug 22 12:35:03 2009 (9825): caching of id -> 
dns%3a%2fadm%2fdns%2fdomain  failed
Sat Aug 22 12:35:10 2009 (9827): <font color="blue">WARNING: LWP get: 
403 Forbidden: /home/httpd/html/res/msu/kashy/physicsLi
b02/06_Vectors_Scalars/threeForceVectors.problem</font>
Sat Aug 22 12:35:10 2009 (9827): <font color="blue">WARNING: LWP get: 
403 Forbidden: /home/httpd/html/res/msu/kashy/physicsLi
b02/06_Vectors_Scalars/threeForceVectors.problem</font>
Sat Aug 22 12:40:20 2009 (9868): caching of id -> 
dns%3a%2fadm%2fdns%2fdomain  failed
Sat Aug 22 12:40:20 2009 (9868): User 802077520 at uprm authorized by uprml1
Sat Aug 22 12:40:25 2009 (9823): Userfile repcopy failed for 
uploaded/uprm/17812314416384a4duprml1/default.sequence.meta
Sat Aug 22 12:40:25 2009 (9823): Userfile repcopy failed for 
uploaded/uprm/17812314416384a4duprml1/default_1250005874.sequenc
e.meta
Sat Aug 22 12:40:25 2009 (9823): Userfile repcopy failed for 
uploaded/uprm/17812314416384a4duprml1/default_1250704012.sequenc
e.meta
this is from /etc/httpd/logs/error_log
[Sat Aug 22 12:30:35 2009] [notice] caught SIGTERM, shutting down
[Sat Aug 22 12:30:45 2009] [notice] suEXEC mechanism enabled (wrapper: 
/usr/sbin/suexec)
[Sat Aug 22 12:30:46 2009] [warn] module perl_module is already loaded, 
skipping
[Sat Aug 22 12:30:48 2009] [notice] Digest: generating secret for digest 
authentication ...
[Sat Aug 22 12:30:48 2009] [notice] Digest: done
[Sat Aug 22 12:30:48 2009] [notice] Apache/2.2.3 (Scientific Linux) 
configured -- resuming normal operations
[Sat Aug 22 12:33:04 2009] [notice] SIGHUP received.  Attempting to restart
[Sat Aug 22 12:33:05 2009] [warn] module perl_module is already loaded, 
skipping
[Sat Aug 22 12:33:07 2009] [notice] Digest: generating secret for digest 
authentication ...
[Sat Aug 22 12:33:07 2009] [notice] Digest: done
[Sat Aug 22 12:33:07 2009] [notice] Apache/2.2.3 (Scientific Linux) 
configured -- resuming normal operations
[Sat Aug 22 12:35:10 2009] [error] access to 
/res/msu/kashy/physicsLib02/06_Vectors_Scalars/threeForceVectors.problem 
failed
for 72.50.35.48, reason: Replication failed for 
ramirez_125095884998279324_uprm_uprml1


Stuart Raeburn wrote:
> Paul
>
>> repcopy_userfile LWP result: 404 Not Found
>> repcopy_userfile LWP result: 500 Cannot write to
>> '/home/httpd/html/userfiles/ndsu/3H21488e15c894a96ndsul1/default.sequence.in.transfer': 
>>
>
>> Tue Aug 18 10:15:23 2009 (27523): Userfile repcopy failed for
>> uploaded/ndsu/3H214842a5d894a96ndsul1/default.sequence.meta
>
> The 404 Not Found matches the lonnet.log entry, and is expected (there
> is no .meta file for a default.sequence).
>
> The "500 Cannot write to ... Permission denied" for the userfile 
> repcopy of default.sequence sounds line an SELinux error.  Is SELinux 
> enabled?  If so set, it to permissive or disabled.  You might also 
> verify that the Apache web child processes are running as the www user.
>
> Stuart Raeburn
> MSU LON-CAPA group
>
>
> Quoting Paul Omernik <paul.omernik at ndsu.edu>:
>
>> Stuart,
>>
>> These are after the modification to lonnet.pm
>>
>> tail --lines=20 /etc/httpd/logs/error_log
>> [Tue Aug 18 05:10:24 2009] [notice] Digest: generating secret for digest
>> authentication ...
>> [Tue Aug 18 05:10:24 2009] [notice] Digest: done
>> [Tue Aug 18 05:10:24 2009] [notice] mod_python: Creating 4 session 
>> mutexes
>> based on 150 max processes and 0 max threads.
>> [Tue Aug 18 05:10:24 2009] [notice] Apache/2.2.3 (Red Hat) configured --
>> resuming normal operations
>> [Tue Aug 18 10:14:45 2009] [notice] SIGHUP received.  Attempting to 
>> restart
>> [Tue Aug 18 10:14:46 2009] [warn] module perl_module is already loaded,
>> skipping
>> [Tue Aug 18 10:14:48 2009] [notice] Digest: generating secret for digest
>> authentication ...
>> [Tue Aug 18 10:14:48 2009] [notice] Digest: done
>> [Tue Aug 18 10:14:48 2009] [notice] mod_python: Creating 4 session 
>> mutexes
>> based on 150 max processes and 0 max threads.
>> [Tue Aug 18 10:14:48 2009] [notice] Apache/2.2.3 (Red Hat) configured --
>> resuming normal operations
>> repcopy_userfile LWP result: 404 Not Found
>> repcopy_userfile LWP result: 500 Cannot write to
>> '/home/httpd/html/userfiles/ndsu/3H214842a5d894a96ndsul1/default.sequence.in.transfer': 
>>
>> Permission denied
>> repcopy_userfile LWP result: 404 Not Found
>> repcopy_userfile LWP result: 500 Cannot write to
>> '/home/httpd/html/userfiles/ndsu/3H21488e15c894a96ndsul1/default.sequence.in.transfer': 
>>
>> Permission denied
>> repcopy_userfile LWP result: 404 Not Found
>> repcopy_userfile LWP result: 500 Cannot write to
>> '/home/httpd/html/userfiles/ndsu/3H21482865c894a96ndsul1/default.sequence.in.transfer': 
>>
>> Permission denied
>> repcopy_userfile LWP result: 404 Not Found
>> repcopy_userfile LWP result: 500 Cannot write to
>> '/home/httpd/html/userfiles/ndsu/1H214826f5d894a41ndsul1/default.sequence.in.transfer': 
>>
>> Permission denied
>> repcopy_userfile LWP result: 404 Not Found
>> repcopy_userfile LWP result: 500 Cannot write to
>> '/home/httpd/html/userfiles/ndsu/3H21486c85d894a96ndsul1/default.sequence.in.transfer': 
>>
>> Permission denied
>>
>>
>> tail /home/httpd/perl/logs/lonnet.log
>> Tue Aug 18 10:15:23 2009 (27523): Userfile repcopy failed for
>> uploaded/ndsu/3H214842a5d894a96ndsul1/default.sequence.meta
>> Tue Aug 18 10:15:23 2009 (27523): Userfile repcopy failed for
>> uploaded/ndsu/3H214842a5d894a96ndsul1/default.sequence
>> Tue Aug 18 10:16:50 2009 (27526): Userfile repcopy failed for
>> uploaded/ndsu/3H21488e15c894a96ndsul1/default.sequence.meta
>> Tue Aug 18 10:16:50 2009 (27526): Userfile repcopy failed for
>> uploaded/ndsu/3H21488e15c894a96ndsul1/default.sequence
>> Tue Aug 18 10:16:55 2009 (27520): Userfile repcopy failed for
>> uploaded/ndsu/3H21482865c894a96ndsul1/default.sequence.meta
>> Tue Aug 18 10:16:55 2009 (27520): Userfile repcopy failed for
>> uploaded/ndsu/3H21482865c894a96ndsul1/default.sequence
>> Tue Aug 18 10:17:00 2009 (27522): Userfile repcopy failed for
>> uploaded/ndsu/1H214826f5d894a41ndsul1/default.sequence.meta
>> Tue Aug 18 10:17:00 2009 (27522): Userfile repcopy failed for
>> uploaded/ndsu/1H214826f5d894a41ndsul1/default.sequence
>> Tue Aug 18 10:17:04 2009 (27524): Userfile repcopy failed for
>> uploaded/ndsu/3H21486c85d894a96ndsul1/default.sequence.meta
>> Tue Aug 18 10:17:04 2009 (27524): Userfile repcopy failed for
>> uploaded/ndsu/3H21486c85d894a96ndsul1/default.sequence
>>
>>
>> ls -la /home/httpd/html/userfiles/ndsu/ | grep H214
>> drwxr-x---  4 www  www  4096 Aug 17 09:36 1H214826f5d894a41ndsul1
>>
>> Which is the one I copied from yesterday.
>>
>> I also see this in lonnet.log:
>>
>> Tue Aug 18 10:41:30 2009 (2906): caching of id ->
>> dns%3a%2fadm%2fdns%2fhosts  failed
>> Tue Aug 18 10:41:31 2009 (2906): caching of id ->
>> dns%3a%2fadm%2fdns%2fdomain  failed
>> Tue Aug 18 10:41:31 2009 (2906): Name iedulib.ipc.kanazawa-u.ac.jp no IP
>> found
>> Tue Aug 18 10:41:31 2009 (2906): Name lon-capa.colorado.edu no IP found
>> Tue Aug 18 10:41:31 2009 (2906): Name nyul1.physics.nyu.edu no IP found
>> Tue Aug 18 10:41:31 2009 (2906): Name 
>> shs-dunk-lin.shs.sarasota.k12.fl.us no
>> IP found
>> Tue Aug 18 10:41:31 2009 (2906): caching of id -> iphost%3aiphost  
>> failed
>>
>> I have been trying to remember what changed, but since I can't 
>> pinpoint a
>> date that this issue actually started, it will be tough to nail 
>> down.  Our
>> summer courses used triton (ndsua2), which was updated to CentOS 5.3 
>> in the
>> same way and at the same time as proteus (ndsua1), and triton is 
>> behaving
>> just fine.
>>
>> Paul
>>
>> On Mon, Aug 17, 2009 at 10:08 PM, Stuart Raeburn <raeburn at msu.edu> 
>> wrote:
>>
>>> Paul,
>>>
>>> I'm seeing the same thing when I log-into a LON-CAPA session on 
>>> ndsua1 and
>>> select an old course of mine, for which the home server is msul1, so 
>>> I don't
>>> think this issue is related to cloning or recent creation of 
>>> courses.  This
>>> seems more of a general issue to do with file replication on ndsua1.
>>>
>>> I suspect there may be a network issue of some sort which prevents the
>>> response to outgoing web requests from ndsua1 being received  again 
>>> by the
>>> server.  It seems that it's not a disk space, as these files are 
>>> very small,
>>> or a permissions issue as you tested that.
>>>
>>> As nsdua1 allows me to start a session on nsdua1 (which will 
>>> authenticate
>>> my log-in against the msu library server), connectivity via the 
>>> LON-CAPA
>>> port (5663) is evidently available.  Furthermore, nmap run against 
>>> port 5663
>>> reports that it is open.
>>>
>>> The missing file issue seems to be related to inbound traffic from 
>>> remote
>>> web servers, as what nsdua1 will do in the case of a 
>>> default.sequence file
>>> (which is an "uploaded" file) is to make an LWP request for the file 
>>> to the
>>> course's home server (the msu library server), which it will then store
>>> locally.
>>>
>>> I see the request from ndsua1 for the file in my MSU course in the 
>>> server
>>> logs on the MSU library server:
>>>
>>> 134.129.87.63 - - [17/Aug/2009:21:59:01 -0400] "GET
>>> /raw/uploaded/msu/33803011027874598msul1/default.sequence HTTP/1.1" 
>>> 200 4781
>>> "-" "libwww-perl
>>>
>>> and the logs suggest that the file was served successfully.
>>>
>>> The question is then:
>>>
>>> 1. Were the requested data ever received on ndsua1?
>>> 2. If so, what did ndsua1 do with the data?
>>>
>>> You may find some clues in /home/httpd/perl/logs/lonnet.log
>>>
>>> The fact that you found that the problem persisted after copying the
>>> missing file via the command line suggests that the LWP request for 
>>> the HEAD
>>> to check if the file exists also resulted in an error.
>>>
>>> This is the relevant code from lonnet.pm for that check.  I'm assuming
>>> that the file you copied was not deleted, but that instead, $lwpresp 
>>> was not
>>> "ok" but the return code was something other than a 404.
>>>
>>> my $lwpresp = &getuploaded('HEAD',$uri,$cdom,$cnum,\$info,\$rtncode);
>>> if ($lwpresp ne 'ok') {
>>> # there is no such file anymore, even though we had a local copy
>>> if ($rtncode eq '404') {
>>>    unlink($file);
>>> }
>>> return -1;
>>>
>>>
>>> In the case where the file has yet to be replicated, this is the 
>>> relevant
>>> chunk of code (from lonnet.pm which fetches the file
>>>
>>> my $transferfile=$file.'.in.transfer';
>>>
>>> $request=new
>>> HTTP::Request('GET',$protocol.'://'.&hostname($homeserver).'/raw/'.$uri); 
>>>
>>> my $response=$ua->request($request,$transferfile);
>>> # did it work?
>>> if ($response->is_error()) {
>>>    unlink($transferfile);
>>>    &logthis("Userfile repcopy failed for $uri");
>>>    return -1;
>>> }
>>> # worked, rename the transfer file
>>> rename($transferfile,$file);
>>>
>>> You could put some debugging lines in there to print out the status 
>>> of the
>>> LWP request, and also the result of the rename()   unfortunately 
>>> LON-CAPA
>>> doesn't check this presently.
>>>
>>> For example add this before the line:  unlink($transferfile);
>>>
>>> print STDERR "repcopy_userfile LWP result: 
>>> ".$response->status_line."\n";
>>>
>>>
>>> and maybe change the rename to do:
>>>
>>> if (!rename($transferfile,$file)) {
>>>    print STDERR "repcopy_userfile rename failed: ".$!."\n";
>>> }
>>>
>>> Then reload Apache and look in /etc/httpd/logs/error_log after 
>>> attempting
>>> to select a course, and retrieve the default.sequence file.
>>>
>>> Stuart Raeburn
>>> MSU LON-CAPA group
>>>
>>>
>>>
>>> Quoting Paul Omernik <paul.omernik at ndsu.edu>:
>>>
>>>  Of our two access servers, only one appears to be allowing mapping 
>>> to the
>>>> library server.  The one with issues (ndsua1) returns:
>>>>
>>>> The following problems occurred:
>>>> Map not loaded: The file
>>>>
>>>> /home/httpd/html/userfiles/ndsu/1H214826f5d894a41ndsul1/default.sequencedoes 
>>>>
>>>> not exist.
>>>>
>>>> Much like Lars, I clone the courses from the previous semester, and 
>>>> I can
>>>> find the previous hash on ndsua2, in fact, it's listed in the Course
>>>> Environment:
>>>>
>>>> /ndsu/2B66911d60f49493bndsul1
>>>>
>>>> and
>>>>
>>>> drwxr-x---  6 www  www  4096 Jun 24 09:32 2B66911d60f49493bndsul1
>>>>
>>>> (on ndsua2).
>>>>
>>>> The hash that ndsua1 can't find is located on ndsua2, so I attempted
>>>> scp'ing
>>>> the information over to ndsua1, matched the www:www ownership, and 
>>>> still
>>>> nothing.
>>>>
>>>> This 'inaccessibility' exists for all courses I've created recently.
>>>> Courses were created by logging in to the library server and cloning a
>>>> previous course.
>>>>
>>>> Thank you,
>>>>
>>>> Paul
>
>
>
> _______________________________________________
> LON-CAPA-admin mailing list
> LON-CAPA-admin at mail.lon-capa.org
> http://mail.lon-capa.org/mailman/listinfo/lon-capa-admin
> From - Sat




More information about the LON-CAPA-admin mailing list