[LON-CAPA-admin] repcopy failed and con_lost

Mike Budzik mikeb at purdue.edu
Fri Oct 23 10:09:06 EDT 2015


Thanks for your response.

Is there a way to not cache these failures? It seems better to retry for
the student instead of serving a cached error for 10 minutes.

At this point we do not have any other evidence of network layer issues.
Is there a way to tell why the access server thinks it can't talk to the
library server?  Are there error messages we should be looking for on the
library node? Is that a long lived connection that handles many
communications (we could monitor to see if that connection gets torn down)
or an ephemeral connection (we could monitor for syn packets that don't
lead to successful connections)? Am I correct in thinking that we are
taking about connections to port 5663 on the library server?

Thanks,
Mike B
On Oct 23, 2015 9:20 AM, "Stuart Raeburn" <raeburn at msu.edu> wrote:

> Mike,
>
> User reports:
>>    My LON-CAPA account showing this status "Having technical difficulties;
>> please check status later" on every questions.
>>
>
> Yes, if the server/VM hosting the user's session was unable to connect
> to the purdue library server to retrieve user-specific parameters for
> resources in a course, then the result -- con_lost -- will be stored
> in memcache (unescaped key: userres:purdue:<username>) with an
> expiration time of 10 minutes.
>
> When the Course Contents screen is displayed, the status shown on the
> right side for each assessment item will be "Having technical
> difficulties" because the status would be "NETWORK_FAILURE", which is set
> when the result of the call to lonnet::get_userresdata() is con_lost --
> from an
> actual failed connection the first time, and from that cached result in
> memcache thereafter, until the cached value expires (after 10 minutes).
>
> Each time lonnet::get_userresdata() is called, and the result is
> "con_lost" the "Trying to get resource data for ... " message will be
> logged in lonnet.log.  If the con_lost response is from the cached item I
> see no benefit in displaying that message.  Accordingly, for the next
> LON-CAPA release I will look into suppressing those messages except when
> the initial con_lost state is encountered from an attempt to access the
> data from the user's homeserver.
>
> [Wed Oct 21 13:28:00 2015] [error] access to
>> /res/purdue/purdue_math/math16020/Functions of Several
>> Variables/Differentials of Multivariable Functions/Problems/con_lost
>> failed
>> for []IP-ADDRESS], reason: Replication failed
>>
>
> The con_lost appended to the path here instead of an actual filename is
> odd.
>
> I looked in the web server log files on all the LON-CAPA servers I manage
> and found a single instance (on 10/8) of something similar, for a student
> from another domain whose session was being hosted on one of the MSU access
> servers.
>
> The request for the URL ending 'con_lost' appears in the logs with a
> timestamp one second after a successfully served request for an item with
> the same path -- /res/.../<filename> where filename is the name of a real
> file, instead of 'con_lost'.  This was after use of the LON-CAPA forward or
> backward navigation arrows to move to another resource.
>
> Wed Oct 21 13:28:08 2015 (4932): Userfile repcopy failed for
>> uploaded/purdue/4o12229eff3c955c9purduel1/supplemental.sequence
>>
>
> Yes, you can expect to see "repcopy failed" for supplemental.sequence in
> the log file -- they do not actually represent a problem.  I will look into
> suppressing those messages specifically for supplemental.sequence for the
> next LON-CAPA release.
>
> When a user displays the Contents page, a check is made to see if there is
> any content in the "Supplemental" content area, by requesting the "top
> level" supplemental map -- supplemental.sequence.
>
> If no supplemental content has ever been added to that area of the course
> then the supplemental.sequence file will not exist, and a "Userfile repcopy
> failed" message will be logged.  That information will be cached (for the
> course) in memcache for 10 minutes, and no further requests for
> supplemental.sequence in will be made in that course until the cached item
> has expired.
>
>
> Stuart Raeburn
> LON-CAPA Academic Consortium
>
> Quoting Mike Budzik <mikeb at purdue.edu>:
>
> Can you help us understand these kinds of errors and what can be done to
>> improve the user experience?
>>
>> User reports:
>>    My LON-CAPA account showing this status "Having technical difficulties;
>> please check status later" on every questions.
>> 10 minutes later:
>>    I log in again after 10 minutes and its okay now.
>>
>> Here is what I found that seems to correspond to the user's experience
>> based on the times:
>> from error_log:
>> [Wed Oct 21 13:28:00 2015] [error] access to
>> /res/purdue/purdue_math/math16020/Functions of Several
>> Variables/Differentials of Multivariable Functions/Problems/con_lost
>> failed
>> for []IP-ADDRESS], reason: Replication failed for
>> [username]_1445446334193674160_purdue_purduel1
>>
>> from lonnet.log
>> This message is repeated 4135 times in under 5 minutes:
>> Wed Oct 21 13:28:00 2015 (3829): <font color="blue">WARNING: Trying to get
>> resource data for [username] at purdue: con_lost</font>
>>
>> There are also some like this:
>> Wed Oct 21 13:28:08 2015 (4932): Userfile repcopy failed for
>> uploaded/purdue/4o12229eff3c955c9purduel1/supplemental.sequence
>>
>
>
> _______________________________________________
> LON-CAPA-admin mailing list
> LON-CAPA-admin at mail.lon-capa.org
> http://mail.lon-capa.org/mailman/listinfo/lon-capa-admin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.lon-capa.org/pipermail/lon-capa-admin/attachments/20151023/9cdf8156/attachment.html>


More information about the LON-CAPA-admin mailing list