From: upham@cs.ubc.ca (Derek Upham) Subject: Re: 8 bit clean implies what? Date: 7 Feb 1993 18:32:33 -0800
jhallen@world.std.com (Joseph H Allen) writes:
>Yes. But here's another fly in the ointment: You shouldn't be so
>eurocentric... there are apparently versions of vt220s which display two
>successive characters as a single chinese or japanese character. So you
>need to make a mode where all deletes operate on two characters...
Actually, it gets worse than that. The GB and Big5 character sets
used in Taiwan have FOUR-byte characters. In general, an application
looks at the high-order bit of the byte "n". If it is zero, the byte
is interpreted as 7-bit ASCII. Otherwise it is interpreted as the
first byte of some character in the alternate set. What's more, there
are various ways of interpreting high-bits in successive bytes to
switch between character sets and save space (the specifics now escape
me, unfortunately). In general, if you want to be safe, do everything
on four byte characters internally, and then add conversion interfaces
to work with whatever character set is needed.
Derek
-- Derek Lynn Upham University of British Columbia upham@cs.ubc.ca Computer Science Department ============================================================================= "Ha! Your Leaping Tiger Kung Fu is no match for my Frightened Piglet Style!"