[tex-live] Re: UTF-8 support

Petr Olsak olsak@math.feld.cvut.cz
Wed, 22 Jan 2003 13:34:08 +0100 (CET)


On Wed, 22 Jan 2003, Vladimir Volovich wrote:

>  PO>   UTF-8 characters are interpreted by TeX as a sequence of
>  PO> commands, so don't use calls like \macro  instead of \macro{}.
>
> it is always a good style to delimit macro arguments with braces
>
>  PO> It means that I don't completelly switch all my old documents to
>  PO> UTF-8 because problems can occur! On the other hand, the encTeX
>  PO> is really robust solution.
>
> with encTeX, expansion of a multibyte UTF-8 character can also be not
> a single letter, but a sequence of several tokens (e.g. a call to
> macro), - so encTeX suffers from exactly the same "problem": you can't
> be sure that one UTF-8 character in the input file will be one token,

NO! Please, read the encTeX manual before this discussion.

>  PO> The second example: You have written that \write files includes
>  PO> only \'A notation of characters in LaTeX. Do you know a documents
>  PO> where you have to re-read the \write files in verbatim mode? I
>  PO> know these documents. What happens in LaTeX in such situation?
>
> nothing bad - it is very well possible to write to files in LaTeX
> using the ASCII LICR representation, and then read the files back:
> you'll need to translate \ into, say, \textbackslash, and characters
> like  to \'A (which is a native representation in LaTeX); then, when
> you read the file back, all will be correct:
> *  will be written as \'A, and read back as 
> * \'A will be written as \textbackslash 'A, and read back as \'A
> so verbatim representation will be preserved.
> (fancyvrb package contains a lot of such framework)

The "\textbacklash dance" will help you if the native verbatim environment
is used. But if you first set all \catcodes to 12 (including backslash)
and second you \input the external file, no \textbacklash will help you.

Sorry, I am not a TeX novice, I _know_ what I am saying. The LaTeX
solution of UTF-8 encoding is not robust.

Petr Olsak