Coordinating Diverse Components [was: Re: Linux usability]

Monty J. Harder lists at kc.rr.com
Mon Feb 18 23:11:50 CST 2002


"DCT Jared Smith" <jared at dctkc.com> wrote:

> IMHO Bruce Perens is probably second only to Larry Wall in his ability
> to be unterritorial about things other people like to fight over. You
> may think that he's territorial in the way he busted "Open Source" out
> from the "Free Software" movement, but I'll say this, free software was
> a rumour and a myth until he (and others) liberated it thusly.

  Yep.  I've always liked Larry's idea of "glue" languages and people - the
kinds of things Bruce works on fit the same definition.  And in order to be
a good Glue Person you have to accomodate the 8th and 9th layers of the
protocol stack - Politics & Religion.  I've seen something where I work that
is analogous to what's happened to *nix :

  Just a few years ago, my company's main software product was distributed,
maintained, and supported by independent sales organizations.  They decided
that vertical integration was the way to go, and started buying out those
ISOs.  This has allowed a larger number of customers to be able to draw upon
the expertise formerly scattered throughout those separate organizations,
but it's also exposed some coordination problems.

  Because of specialization, there is no longer a single person who knows a
client's setup intimately.  But we still have those systems set up in their
diverse ways, that we are trying to transition toward some configuration
standards.  That transition itself is often difficult, because there are so
many pieces that fit together in often inexplicable ways.  A particularly
nasty example of this is PowerChute.

  As most of you probably know, PowerChute is the APC program that talks to
a UPS and makes sure a server shuts down gracefully before the battery
discharges in the event of a power failure.  This is a Good Thing.  But
there's a problem with PowerChute (at least on SCO systems) - if logins are
enabled on the serial port dedicated to communicating with the UPS, the
software becomes confused and will do one of two things:

    1.  Keep switching to battery, run the battery down, and shutdown the
server properly, albeit unnecessarily.
    2.  Crash the server, forcing it to a hard reboot in a matter of seconds
without unmounting filesystems first.

Now, if I were writing a piece of software like this, I wouldn't just stop
at saying "Hey, User! if you install this thing on /dev/tty2a, don't enable
logins either there or on /dev/tty2A!", because there are just too many
things that can enable logins behind your back.  At one site in Nebraska
recently, a field tech got me on the phone to try to figure out what was
doing this, and I determined it was because the server he'd replaced had the
modem on 2A and UPS on 1a, and the new server was adhering to our standard
of modem on 1A and UPS on 2a (case is indeed significant here) but that some
of the EDI scripts he copied from the old server had hard-coded into them to
ask the dialer to use a modem on 2A.  No problem there, except that the
Friendly Dialer, after failing to find a modem on 2A, is kind enough to
're'-enable logins on 2a!

  So every time one of these scripts ran, it set in motion a chain reaction
that inevitably crashes the server a few seconds later.  There is no
coordination between these different parts - no single place where the
setup/ini/rc files of the sundry components can be validated against each
other.  What we need is a well-defined interface for these pieces to fit
together, so that configuring a system doesn't get bogged down in
interdepartmental infighting.  While I think I might have come up with a fix
for the PowerChute problem (which, as I said before, really should have been
implemented by APC itself) it's really just one aspect of this larger
problem.  When these small ISOs configured everything on a system, they were
aware of the potential for problems.  Now we have these {de|com}partments
and a lot of problems fall through the cracks between the 'territories'.

  What to do?  Make some glue.  It isn't sufficient for these pieces to
function in a vacuum.  There needs to be a mechanism for documenting how
they fit together, and allowing adjustments in one piece to propagate
gracefully to the others when necessary.  I haven't been at my job long
enough to have accumulated a lot of Street Cred with the Guys in the Ties,
but fortunately, there's one particular person who's been recently promoted
into a position to do something about these coordination issues who always
gives me an honest technical answer to my "why do we do it this way"
questions, instead of a political/religious "Because God Almighty carved it
in stone tablets with His Own Hand and Thou shalt not question Him further
or thy soul be in peril" blow-off.

  And that's all it takes.  One person (a Stallman, Torvalds, Wall, etc.)
who can put his Official Seal of Approval on any plan to make sense out of
the mess.  When the project accumulates a certain critical mass (heh) it
becomes a standard that nobody would dare ignore.  And the system becomes
easier to configure, because instead of an individual human requiring
intimate knowledge of these details of a system, there is  "Admin in a
Script" to do it instead.

  </rant.>  A guy can dream, can't he?




More information about the Kclug mailing list