[LON-CAPA-cvs] cvs: loncom / lond
foxr
lon-capa-cvs-allow@mail.lon-capa.org
Tue, 07 Oct 2008 10:08:12 -0000
This is a MIME encoded message
--foxr1223374092
Content-Type: text/plain
foxr Tue Oct 7 06:08:12 2008 EDT
Modified files:
/loncom lond
Log:
Documented log messages that can be emitted as POD entries.
--foxr1223374092
Content-Type: text/plain
Content-Disposition: attachment; filename="foxr-20081007060812.txt"
Index: loncom/lond
diff -u loncom/lond:1.408 loncom/lond:1.409
--- loncom/lond:1.408 Fri Sep 5 20:47:13 2008
+++ loncom/lond Tue Oct 7 06:08:06 2008
@@ -2,7 +2,7 @@
# The LearningOnline Network
# lond "LON Daemon" Server (port "LOND" 5663)
#
-# $Id: lond,v 1.408 2008/09/06 00:47:13 raeburn Exp $
+# $Id: lond,v 1.409 2008/10/07 10:08:06 foxr Exp $
#
# Copyright Michigan State University Board of Trustees
#
@@ -59,7 +59,7 @@
my $status='';
my $lastlog='';
-my $VERSION='$Revision: 1.408 $'; #' stupid emacs
+my $VERSION='$Revision: 1.409 $'; #' stupid emacs
my $remoteVERSION;
my $currenthostid="default";
my $currentdomainid;
@@ -7118,3 +7118,406 @@
Server/Process
=cut
+
+
+=pod
+
+=head1 LOG MESSAGES
+
+The messages below can be emitted in the lond log. This log is located
+in ~httpd/perl/logs/lond.log Many log messages have HTML encapsulation
+to provide coloring if examined from inside a web page. Some do not.
+Where color is used, the colors are; Red for sometihhng to get excited
+about and to follow up on. Yellow for something to keep an eye on to
+be sure it does not get worse, Green,and Blue for informational items.
+
+In the discussions below, sometimes reference is made to ~httpd
+when describing file locations. There isn't really an httpd
+user, however there is an httpd directory that gets installed in the
+place that user home directories go. On linux, this is usually
+(always?) /home/httpd.
+
+
+Some messages are colorless. These are usually (not always)
+Green/Blue color level messages.
+
+=over 2
+
+=item (Red) LocalConnection rejecting non local: <ip> ne 127.0.0.1
+
+A local connection negotiation was attempted by
+a host whose IP address was not 127.0.0.1.
+The socket is closed and the child will exit.
+lond has three ways to establish an encyrption
+key with a client:
+
+=over 2
+
+=item local
+
+The key is written and read from a file.
+This is only valid for connections from localhost.
+
+=item insecure
+
+The key is generated by the server and
+transmitted to the client.
+
+=item ssl (secure)
+
+An ssl connection is negotiated with the client,
+the key is generated by the server and sent to the
+client across this ssl connection before the
+ssl connectionis terminated and clear text
+transmission resumes.
+
+=back
+
+=item (Red) LocalConnection: caller is insane! init = <init> and type = <type>
+
+The client is local but has not sent an initialization
+string that is the literal "init:local" The connection
+is closed and the child exits.
+
+=item Red CRITICAL Can't get key file <error>
+
+SSL key negotiation is being attempted but the call to
+lonssl::KeyFile failed. This usually means that the
+configuration file is not correctly defining or protecting
+the directories/files lonCertificateDirectory or
+lonnetPrivateKey
+<error> is a string that describes the reason that
+the key file could not be located.
+
+=item (Red) CRITICAL Can't get certificates <error>
+
+SSL key negotiation failed because we were not able to retrives our certificate
+or the CA's certificate in the call to lonssl::CertificateFile
+<error> is the textual reason this failed. Usual reasons:
+
+=over 2
+
+=item Apache config file for loncapa incorrect:
+
+one of the variables
+lonCertificateDirectory, lonnetCertificateAuthority, or lonnetCertificate
+undefined or incorrect
+
+=item Permission error:
+
+The directory pointed to by lonCertificateDirectory is not readable by lond
+
+=item Permission error:
+
+Files in the directory pointed to by lonCertificateDirectory are not readable by lond.
+
+=item Installation error:
+
+Either the certificate authority file or the certificate have not
+been installed in lonCertificateDirectory.
+
+=item (Red) CRITICAL SSL Socket promotion failed: <err>
+
+The promotion of the connection from plaintext to SSL failed
+<err> is the reason for the failure. There are two
+system calls involved in the promotion (one of which failed),
+a dup to produce
+a second fd on the raw socket over which the encrypted data
+will flow and IO::SOcket::SSL->new_from_fd which creates
+the SSL connection on the duped fd.
+
+=item (Blue) WARNING client did not respond to challenge
+
+This occurs on an insecure (non SSL) connection negotiation request.
+lond generates some number from the time, the PID and sends it to
+the client. The client must respond by echoing this information back.
+If the client does not do so, that's a violation of the challenge
+protocols and the connection will be failed.
+
+=item (Red) No manager table. Nobody can manage!!
+
+lond has the concept of privileged hosts that
+can perform remote management function such
+as update the hosts.tab. The manager hosts
+are described in the
+~httpd/lonTabs/managers.tab file.
+this message is logged if this file is missing.
+
+
+=item (Green) Registering manager <dnsname> as <cluster_name> with <ipaddress>
+
+Reports the successful parse and registration
+of a specific manager.
+
+=item Green existing host <clustername:dnsname>
+
+The manager host is already defined in the hosts.tab
+the information in that table, rather than the info in the
+manager table will be used to determine the manager's ip.
+
+=item (Red) Unable to craete <filename>
+
+lond has been asked to create new versions of an administrative
+file (by a manager). When this is done, the new file is created
+in a temp file and then renamed into place so that there are always
+usable administrative files, even if the update fails. This failure
+message means that the temp file could not be created.
+The update is abandoned, and the old file is available for use.
+
+=item (Green) CopyFile from <oldname> to <newname> failed
+
+In an update of administrative files, the copy of the existing file to a
+backup file failed. The installation of the new file may still succeed,
+but there will not be a back up file to rever to (this should probably
+be yellow).
+
+=item (Green) Pushfile: backed up <oldname> to <newname>
+
+See above, the backup of the old administrative file succeeded.
+
+=item (Red) Pushfile: Unable to install <filename> <reason>
+
+The new administrative file could not be installed. In this case,
+the old administrative file is still in use.
+
+=item (Green) Installed new < filename>.
+
+The new administrative file was successfullly installed.
+
+=item (Red) Reinitializing lond pid=<pid>
+
+The lonc child process <pid> will be sent a USR2
+signal.
+
+=item (Red) Reinitializing self
+
+We've been asked to re-read our administrative files,and
+are doing so.
+
+=item (Yellow) error:Invalid process identifier <ident>
+
+A reinit command was received, but the target part of the
+command was not valid. It must be either
+'lond' or 'lonc' but was <ident>
+
+=item (Green) isValideditCommand checking: Command = <command> Key = <key> newline = <newline>
+
+Checking to see if lond has been handed a valid edit
+command. It is possible the edit command is not valid
+in that case there are no log messages to indicate that.
+
+=item Result of password change for <username> pwchange_success
+
+The password for <username> was
+successfully changed.
+
+=item Unable to open <user> passwd to change password
+
+Could not rewrite the
+internal password file for a user
+
+=item Result of password change for <user> : <result>
+
+A unix password change for <user> was attempted
+and the pipe returned <result>
+
+=item LWP GET: <message> for <fname> (<remoteurl>)
+
+The lightweight process fetch for a resource failed
+with <message> the local filename that should
+have existed/been created was <fname> the
+corresponding URI: <remoteurl> This is emitted in several
+places.
+
+=item Unable to move <transname> to <destname>
+
+From fetch_user_file_handler - the user file was replicated but could not
+be mv'd to its final location.
+
+=item Looking for <domain> <username>
+
+From user_has_session_handler - This should be a Debug call instead
+it indicates lond is about to check whether the specified user has a
+session active on the specified domain on the local host.
+
+=item Client <ip> (<name>) hanging up: <input>
+
+lond has been asked to exit by its client. The <ip> and <name> identify the
+client systemand <input> is the full exit command sent to the server.
+
+=item Red CRITICAL: ABNORMAL EXIT. child <pid> for server <hostname> died through a crass with this error->[<message>].
+
+A lond child terminated. NOte that this termination can also occur when the
+child receives the QUIT or DIE signals. <pid> is the process id of the child,
+<hostname> the host lond is working for, and <message> the reason the child died
+to the best of our ability to get it (I would guess that any numeric value
+represents and errno value). This is immediately followed by
+
+=item Famous last words: Catching exception - <log>
+
+Where log is some recent information about the state of the child.
+
+=item Red CRITICAL: TIME OUT <pid>
+
+Some timeout occured for server <pid>. THis is normally a timeout on an LWP
+doing an HTTP::GET.
+
+=item child <pid> died
+
+The reaper caught a SIGCHILD for the lond child process <pid>
+This should be modified to also display the IP of the dying child
+$children{$pid}
+
+=item Unknown child 0 died
+A child died but the wait for it returned a pid of zero which really should not
+ever happen.
+
+=item Child <which> - <pid> looks like we missed it's death
+
+When a sigchild is received, the reaper process checks all children to see if they are
+alive. If children are dying quite quickly, the lack of signal queuing can mean
+that a signal hearalds the death of more than one child. If so this message indicates
+which other one died. <which> is the ip of a dead child
+
+=item Free socket: <shutdownretval>
+
+The HUNTSMAN sub was called due to a SIGINT in a child process. The socket is being shutdown.
+for whatever reason, <shutdownretval> is printed but in fact shutdown() is not documented
+to return anything. This is followed by:
+
+=item Red CRITICAL: Shutting down
+
+Just prior to exit.
+
+=item Free socket: <shutdownretval>
+
+The HUPSMAN sub was called due to a SIGHUP. all children get killsed, and lond execs itself.
+This is followed by:
+
+=item (Red) CRITICAL: Restarting
+
+lond is about to exec itself to restart.
+
+=item (Blue) Updating connections
+
+(In response to a USR2). All the children (except the one for localhost)
+are about to be killed, the hosts tab reread, and Apache reloaded via apachereload.
+
+=item (Blue) UpdateHosts killing child <pid> for ip <ip>
+
+Due to USR2 as above.
+
+=item (Green) keeping child for ip <ip> (pid = <pid>)
+
+In response to USR2 as above, the child indicated is not being restarted because
+it's assumed that we'll always need a child for the localhost.
+
+
+=item Going to check on the children
+
+Parent is about to check on the health of the child processes.
+Note that this is in response to a USR1 sent to the parent lond.
+there may be one or more of the next two messages:
+
+=item <pid> is dead
+
+A child that we have in our child hash as alive has evidently died.
+
+=item Child <pid> did not respond
+
+In the health check the child <pid> did not update/produce a pid_.txt
+file when sent it's USR1 signal. That process is killed with a 9 signal, as it's
+assumed to be hung in some un-fixable way.
+
+=item Finished checking children
+
+Master processs's USR1 processing is cojmplete.
+
+=item (Red) CRITICAL: ------- Starting ------
+
+(There are more '-'s on either side). Lond has forked itself off to
+form a new session and is about to start actual initialization.
+
+=item (Green) Attempting to start child (<client>)
+
+Started a new child process for <client>. Client is IO::Socket object
+connected to the child. This was as a result of a TCP/IP connection from a client.
+
+=item Unable to determine who caller was, getpeername returned nothing
+
+In child process initialization. either getpeername returned undef or
+a zero sized object was returned. Processing continues, but in my opinion,
+this should be cause for the child to exit.
+
+=item Unable to determine clientip
+
+In child process initialization. The peer address from getpeername was not defined.
+The client address is stored as "Unavailable" and processing continues.
+
+=item (Yellow) INFO: Connection <ip> <name> connection type = <type>
+
+In child initialization. A good connectionw as received from <ip>.
+
+=over 2
+
+=item <name>
+
+is the name of the client from hosts.tab.
+
+=item <type>
+
+Is the connection type which is either
+
+=over 2
+
+=item manager
+
+The connection is from a manager node, not in hosts.tab
+
+=item client
+
+the connection is from a non-manager in the hosts.tab
+
+=item both
+
+The connection is from a manager in the hosts.tab.
+
+=back
+
+=back
+
+=item (Blue) Certificates not installed -- trying insecure auth
+
+One of the certificate file, key file or
+certificate authority file could not be found for a client attempting
+SSL connection intiation. COnnection will be attemptied in in-secure mode.
+(this would be a system with an up to date lond that has not gotten a
+certificate from us).
+
+=item (Green) Successful local authentication
+
+A local connection successfully negotiated the encryption key.
+In this case the IDEA key is in a file (that is hopefully well protected).
+
+=item (Green) Successful ssl authentication with <client>
+
+The client (<client> is the peer's name in hosts.tab), has successfully
+negotiated an SSL connection with this child process.
+
+=item (Green) Successful insecure authentication with <client>
+
+
+The client has successfully negotiated an insecure connection withthe child process.
+
+=item (Yellow) Attempted insecure connection disallowed
+
+The client attempted and failed to successfully negotiate a successful insecure
+connection. This can happen either because the variable londAllowInsecure is false
+or undefined, or becuse the child did not successfully echo back the challenge
+string.
+
+
+=back
+
+
+=cut
--foxr1223374092--