From: ketil@edb.tih.no (Ketil Albertsen,TIH) Subject: Re: 8 bit clean implies what? Date: Mon, 8 Feb 1993 09:07:53 GMT
In article <DAVIS.93Feb6132229@pacific.mps.ohio-state.edu>, davis@pacific.mps.ohio-state.edu
("John E. Davis") writes:
>As I understand it, an editor which is 8 bit clean can display ALL 256
>characters on the output device.
If you go by international standards (frequently called ANSI standards by the
US community... :->): No. The 190 (+space) characters. There just aren't 256
character (code)s.
The 64 codes from 0 to 31, and 127 (DEL) to 160 are NOT character codes but
control codes. The correct handling is to *process* them rather than to
display them. The processing may have an effect on the display, eg. CR and
LF, both changing the active position, or ESC-sequences switching to a
different character set (among other things), but the codes are not, per se,
"displayed". The "processing" may be limited to simply storing (conserving)
them because the display or software does not support the defined function
for the control code.
Wrt. input: There should be no restriction on how you enter the control
functions. CR (13) has its own key (did you ever notice that uppercase
letters are entered by key combinations?), but there is nothing wrong by
entering ESC [ 1 2 m ("Second alternative font") as a menu choice rather
than as five separate hex values.
But obviously this assumes that you plan to honor ISO character code
definitions. So, many people would say that it is not "clean". But as
another poster commented, there is a distinction between a binary
editor and an 8-bit clean editor. If you want to be able to edit arbitrary
character sets, with arbitrary use of the control codes (CR/LF relocated
to other code positions...), then you need a binary editor. IMHO it is
sufficient for an editor anno 1993 to support ISO character sets -
preferably all of them.
>Is it just a coincidence that 255 displays
>nothing on my PC or is this a general feature? Should I make any assumptions
>regarding 255? I would like to reserve it for my own purposes.
In 8859/1, 255 is umlaut y. In 8859/2, /3 and /4 it is dot above. Several
character sets do not use 160 and 255 because it would prohibit representation
in a 7-bit environment; ISO 2022 distinguishes between 94 and 96 character
C1 sets.
Before you run out to buy the entire collection of ISO standards for character
sets and control functions: If you were to implement all of it, you'd have
enough to do for the rest of your life... Writing a binary editor may be
simpler.