[tex-live] Re: UTF-8 support

Vladimir Volovich vvv@vsu.ru
Wed, 22 Jan 2003 15:26:19 +0300


"PO" == Petr Olsak writes:

 PO> I am not talking about LaTeX. My encTeX is a solution for all
 PO> macros, not only LaTeX.
 >> LaTeX is a macro package (i.e. it works purely with standard TeX
 >> features), thus it shows that it is possible to use similar robust
 >> approach in other macro packages too.

 PO> The LaTeX approach is not robust. This is a reason why I
 PO> developed my encTeX.

 PO> I cite from ucs.sty documentation:

 PO>   UTF-8 characters are interpreted by TeX as a sequence of
 PO> commands, so don't use calls like \macro  instead of \macro{}.

it is always a good style to delimit macro arguments with braces

 PO> It means that I don't completelly switch all my old documents to
 PO> UTF-8 because problems can occur! On the other hand, the encTeX
 PO> is really robust solution.

with encTeX, expansion of a multibyte UTF-8 character can also be not
a single letter, but a sequence of several tokens (e.g. a call to
macro), - so encTeX suffers from exactly the same "problem": you can't
be sure that one UTF-8 character in the input file will be one token,
so you cannot use \macro  in encTeX too, unless you are sure that 
will expand to some single character but not to, say, \"a.

 PO> The second example: You have written that \write files includes
 PO> only \'A notation of characters in LaTeX. Do you know a documents
 PO> where you have to re-read the \write files in verbatim mode? I
 PO> know these documents. What happens in LaTeX in such situation?

nothing bad - it is very well possible to write to files in LaTeX
using the ASCII LICR representation, and then read the files back:
you'll need to translate \ into, say, \textbackslash, and characters
like  to \'A (which is a native representation in LaTeX); then, when
you read the file back, all will be correct:
*  will be written as \'A, and read back as 
* \'A will be written as \textbackslash 'A, and read back as \'A
so verbatim representation will be preserved.
(fancyvrb package contains a lot of such framework)

 PO> Please, don't disseminate that UTF-8 solution in LaTeX is
 PO> robust. This is not true.

LaTeX is robust, - you only need to use it consistently.

Best,
v.