[LON-CAPA-dev] lond..non-preforking version.

Ron Fox lon-capa-dev@mail.lon-capa.org
Wed, 15 Jan 2003 07:59:33 -0500


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

  I have committed a non-preforking version of lond.. very
 preliminary test version. Much stuff needs to be cleaned up as well
 as tested.  It's currently running on lonkashy (otherwise known as
 nscll1).  I would appreciate any bashing people can do against this
 daemon. 
   A little history about why non-preforking:
 

  Preforking daemons are generally used and useful when the
connection rate is high.  The idea is to amortize process creation
overhead against several connections, and therefore turn it into a
small effect in the total connection timing.  In LonCAPA, as we all
know, lonc is there to maintain a (mostly) persistent connection. 
This means that the time averaged connection rate is in rough numbers
0/sec.

  Ok, so that may mean that preforking was an unnecessary design
choice, but why fix it if it wasn't broken?  Two reasons:
- - It is broken. There are evidently errors in the maintenance of the
population of
  pre-forked servers as evidenced by Guy's observation that the ELHS
lond at some
  point got down to a single child and refused to spawn any more
children.
  The maintenance of a child population, while seemingly simple, has
a few subtle 
  issues related to signals and how (un)reliable they may be under
certain 
  circumstances. Fixing the problem is probably harder than just
removing the issue.

- - The work I have next involves allowing removing request
serialization that occurs
  on the single lonc/lond connection that each system with
every-other system.
  To do this will require at some point that the lond not know in
advance how
  many children it will spawn off a-priori.  Modification to a
preforking server
  along those lines are possible, but are even more complex than
before.

- - I like simple designs where possible, they are more understandable,
more 
  maintainable, more likely to work, complexity can always come later
if required.
  (ok that's three reasons).

A simple rundown of the changes required:
- - lond's main loop of sleeping and forking children into the prefork
population
  is gone
- - the main loop now consists of accepting connections and passing
them to a
  thinly modified make_child.
- - the thinly modified make_child forks, and captures the child
information as 
  before so child exits can be logged.
- - The child process is lightly modified:  Instead of accepting
connections
  and then validating them, it validates the one single connection it
has been 
  handed by the parent and does transactions along that connection
until the
  peer exits, at which time it too exits... resulting in logs (after
all at this
  time lonc's hold completely persistent and completely reliable
connections right
  ;-)).

  As long as the parent process runs, new children will be created on
demand with no-limitation.  Note that this implies, technically, a
hole for a DOS attack:
If I want to bring a Lon-CAPA server down all I really need to do is 
write a program/script that keeps making connections to it on the
lond port, and holds them without sending any data.  Since the
challenge response sequence has no timeout associated with it, each
lond will stall. Eventually I'll use up either the number of sockets
or the number of processes the system is allowed to create and the
system as a whole will stall.  Note that this attack requires must
either come from a host that is in the hosts.tab file or
alternatively from a system that is spoofing that host's ip.  
  A  remedy to consider for later would be:
- - Put a timeout on the challenge/response dialog.
- - Log timeouts globally.
- - Feed back the timeout information to lond the master.
- - Refuse connections from a host that has timed out more than x
number of times 
  in a row without success for some long time.

Since in theory there's already a lond/lonc connection that's
legitimate, this fix does not allow a node-node DOS attack to
succeed...esp. if the rewritten lonc code only times out and closes
>additional< connections, and keeps at least one connection always
alive... trade -off here against scalability....hmph.


-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 6.5.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBPiVbNU94N2h8GnCnEQJV9wCg4BQDvg9rO14w3CycoV4bk+6AKDkAoJjw
AaNk1o11aPmdrGEhrJXMhBw2
=WJJ3
-----END PGP SIGNATURE-----